IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re application of: 
TORMASOV et al 
Appl.No.: 09/918,032 
Filed: July 30, 2001 

For: Distributed Network Data Storage 
System and Method 

Declaration of Alexander Tormasov, Mikhail Khassine, Serguei Beloussov and 
Stanislav Protassov under 37 C.F.R. § 1.131 

Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

The undersigned, Alexander Tormasov, Mikhail Khassine, Serguei Beloussov and 
Stanislav Protassov declare and state that, 

1. We are the inventors of the above-captioned application, U.S. Appl. No. 09/918,032, 
filed July 30, 2001. 

2. Prior to December 14, 2000, we, the inventors, had completed our invention in a 
WTO country (specifically, one of the inventors, Serguei Beloussov, was, during the relevant 
time period, in the United States and in Singapore), as claimed in the subject application, 
evidenced by the following: 

3. Exhibit A, entitled "Redundant Clustered Distributed File System," version 0.3, was 
previously submitted, and confirms the date of conception prior to the filing date of Boykin et 
al., U.S. Patent Publication No. 2002/0078461 and Lahr et al., U.S. Patent Publication No. 
2002/0046405 (i.e., prior to December 14, 2000). 

4. Exhibit D is a document in Russian dated prior to December 14, 2000, discussing the 
Topological Server ("TopD," the name given to each node of the cluster) and describing system 
architecture. Exhibit E is the same document with informal translation of some of the portions 
added to it in [BRACKETS]. (The untranslated portions deal primarily with the details of packet 
transfer, however, Applicants will provide a translation of the rest of the document if the 
Examiner deems it necessary.) 

5. Exhibit F is a figure illustrating the topology of the system in which the Topological 
Server ("TopD") "lives," which was also created prior to December 14, 2000. 
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6. Claim 26 recites the following: 
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26. A system for organizing distributed file storage comprising: 

a plurality of servers providing, to a plurality of clients, file access 
services for accessing files stored on the plurality of servers; and 

a list of neighbor servers maintained by each server, wherein the neighbor 
servers are a subset of the plurality of servers, 

wherein the files are divided into a plurality of pieces stored on the 
plurality of servers, and 

wherein the list is used to obtain information for reconstructing files stored 
on the distributed file system. 

7. The plurality of servers and the list of neighbor servers are discussed, for example, at 
page 5 of Exhibit A ("each node in a cluster have the same control functions," "each node have 
information only about some adjacent neighbors it interested in"), and at page 7 of the document 
("Logically a node autoconfigure its place with the nearest neighbors to find active participants 
for redundant data storage. The file system daemons maintain permanently opened connection to 
node neighbors."). That each server (node) has to maintain a list of the neighboring servers is 
self-evident from the overall discussion. 

8. The list of neighbor servers is also discussed in Exhibit D/E, a document in Russian 
discussing the Topological Server and describing system architecture. For example, the 
following portion of the Exhibit E discusses the neighbor list: 

ripH 3anycKe TopD nojiynaeT cnHCOK coce^eft (neighbour list), HeKOTopbie H3 
3thx coce^en 6y/jyT CTaraHecKHMH, to ecTb TopD 6y#eT cTaparbca 
no,zmep>KHBaTi> nocTOflHHyio cb^3b c hhmh, ocTajibHbie jumaMHHecKHMH, to ecTb 
ohh GyayT 3aMeHflTbCJi c noMonibio auropHTMa aHHaMHHecicoro 
KOHcJmrypHpoBaHHH. CyTb aroro anropHTMa b tom, HTo6bi ^HHaMHnecKHMH 

COCe^MH TopD HBJIHJlHCb HaH60Jiee 6jlH3KHe (nO HeKOTOpbIM KpHTepHJIM) HO£bI 

TorFS. Kaambiii cocea b cnHCKe HMeeT cboh HOMep, KOTopbifi Hcnojib3yeTC5i npn 
MapmpyTH3auHH naiceTOB. Coccuom HOMep 0 ABn^eTca caM TopD. 

[UPON LAUNCH, THE TOPD RECEIVES A LIST OF NEIGHBORS, SOME 
OF WHICH MIGHT BE STATIC, IN OTHER WORDS, TOPD WILL 
ATTEMPT TO MAINTAIN CONSTANT COMMUNICATION WITH THEM, 
AND OTHERS WILL BE DYNAMIC, IN OTHER WORDS, THEY WILL BE 
REPLACED USING A MECHANISM OF DYNAMIC CONFIGURATION. 
THE GIST OF THE ALGORITHM IS IN ENSURING THAT THE DYNAMIC 
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NEIGHBORS OF TOPD ARE THOSE NODES THAT ARE THE CLOSEST 
(BY SOME CRITERIA) OF TORFS. EACH NEIGHBOR IN THE LIST HAS 
ITS OWN NUMBER, USED FOR ROUTING PACKETS. THE NEIGHBOR 
NUMBER 0 THE TOPD ITSELF.] 

9. "File access" is discussed in numerous places in Exhibits A and D/E, for example, 
section 7, entitled "File Access," on pages 9-1 1 of Exhibit A. 

10. The plurality of clients are discussed, e.g., in section 10 on page 12 of the document, 
where Virtual Environments (VEs) are addressed: 

Each VE within a local node should receive its own separate filesystem 
namespace begin s from root For the purpose of disk space optimization, we 
will provide a "copy-on-write" mechanism within a local node. 

From global node namespace (not within VE) the given filesystem subtree is 
assigned for VE support. Within it, each VE receives its own subtree for the 
VE root namespace. Also, some another subtree has been assigned a status of 
"VE template". 

For file read access, the VE personal subtree (/.ve/VE-ID/...) has been 
checked first. If file exists, it is accessed from here. If file here does not exists, 
VE template subtree has been checked. On success the special kind of link 
has been created within VE personal subtree. This link copies all file 
attributes while for the file content application is been redirected to VE 
template disk space. 

Normally, VEs function similar to a standalone computer and service one or several 

clients. 

1 1 . Also, the plurality of clients are expressly shown in Exhibit F. 

12. The aspect "wherein the files are divided into a plurality of pieces stored on the 
plurality of servers" is discussed, for example, at the top of page 6 of Exhibit A: 

each file stored in form of pieces and amount of pieces could vary; each piece 
exist in the only exemplar in system . . . Pieces are stored on distributed 
nodes, provided high availability 

13. Also, page 9, section 6 of Exhibit A, entitled "Regulated redundancy and versioning," 
discusses the use of the so called (N,K) algorithms for dividing files into N pieces (in the field of 
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cryptography, it is understood that typically, though not always, each of the N pieces is stored on 
a different server), such that any K out of the N pieces can be used to reconstruct the file: 

Each block stored in form: divided into N pieces, and any K pieces of them 
(K <= N) are enough to assemble initial block. Size of each block are 
minimized and equal to (size of block)/K with minimal overhead of some 
additional info in header. Appropriate mathematics are exists. 

Typically file divided into at least N = (K+M) pieces, where M - amount of 
the nodes which could disappear simultaneously. Each block is stored on a 
separate remote node. There are K+M individual nodes with block pieces 
and the removal of M of them still gives an ability to restore block. 

Maximal amount of possible pieces are restricted in the very initial stage of 
piece creation - and should be reasonable large (up to K*L where L - max 
nodes in cluster). 

14. This discussion therefore describes one example of an algorithm that can be used to 
distribute the data of the file across a plurality of servers (nodes). The aspect of distributing a 
file across multiple nodes (e.g., servers of a cluster) is also discussed at the top of page 12: 

Pieces distribution algorithm have to place it on appropriate amount of 
nodes (in agreement with fault tolerance requirements), and provide some 
migration mechanism to minimize a network loading (just move each piece to 
node on which it will be used maximally intensively). 

15. The aspect "wherein the list is used to obtain information for reconstructing files 
stored on the distributed file system" is discussed, for example in the second and third 
paragraphs of section 6 on page 9, quoted above, where it is clear from the context of the 
document that the file is reconstructed from the servers on which the N pieces are stored. Also, 
the portion of the TopD document (Exhibit D/E) quoted above discusses the use of neighbor 
lists. 

16. Therefore, as the above demonstrates, the invention of claim 26 was conceived before 
the earliest priority date of the cited references. 

17. Independent claim 33 is reproduced below: 

33. A method for distributed file storage comprising: 

dividing a plurality of servers into a plurality of groups, with each server 
belonging to at least one group; 
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on each server, maintaining a list of neighbor servers belonging to the 
same group; 

supporting file access services on each of the servers; 

dividing a file into a plurality of pieces that are derived from the file; and 

storing each of the pieces of each file on the servers selected from the list. 

18. The aspects of this claim recited in the second, fourth and fifth clauses ("on each 
server, maintaining a list of neighbor servers belonging to the same group," "supporting file 
access services on each of the servers," "dividing a file into a plurality of pieces that are derived 
from the file" and "storing each of the pieces of each file on the servers selected from the list") 
have been addressed above with reference to claim 26 and Exhibits A, D/E and F 5 discussed 
earlier. 

19. The aspect of "dividing a plurality of servers into a plurality of groups, with each 
server belonging to at least one group" is discussed, for example, in the TopD document (Exhibit 
D/E): 

IlpH 3anycKe TopD nojiynaeT cnncoK coce^efi (neighbour list), HeKOTopbie H3 
3thx coceaeft 6yayT CTaTHnecKHMH, to ecTb TopD 6yAeT cTapaTbca 

IIO,Zmep)KHBaTb nOCTOflHHyK) CB>I3b C HHMH, OCTaJIbHbie flHHaMHHeCKHMH, TO ecTb 

ohh 6ynyT 3aMeH)iTbc» c noMOinbio ajiropHTMa AHHaMnnecKoro 

KOH^HrypHpOBaHHH. CyTb 3TOrO ajiropHTMa B TOM, HT06bI flHHaMHHeCKHMH 

cocezuiMH TopD HBJiHJiHCb Haii6ojiee 6jiH3KHe (no HeKOTopbiM KpHTepn^M) hoabi 
TorFS. KaaeflbiH cocea b cnHCKe HMeeT cboh HOMep, KOTopbifi Hcnonb3yeTCH npn 
MapinpyTH3aijHH naKeTOB. Coce^OM HOMep 0 JiBJiJieTCfl caM TopD. 

[UPON LAUNCH, THE TOPD RECEIVES A LIST OF NEIGHBORS, SOME 
OF WHICH MIGHT BE STATIC, IN OTHER WORDS, TOPD WILL 
ATTEMPT TO MAINTAIN CONSTANT COMMUNICATION WITH THEM, 
AND OTHERS WILL BE DYNAMIC, IN OTHER WORDS, THEY WILL BE 
REPLACED USING A MECHANISM OF DYNAMIC CONFIGURATION. 
THE GIST OF THE ALGORITHM IS IN ENSURING THAT THE DYNAMIC 
NEIGHBORS OF TOPD ARE THOSE NODES THAT ARE THE CLOSEST 
(BY SOME CRITERIA) OF TORFS. EACH NEIGHBOR IN THE LIST HAS 
ITS OWN NUMBER, USED FOR ROUTING PACKETS. THE NEIGHBOR 
NUMBER 0 IS THE TOPD ITSELF.] 

20. With dynamically configurable list of neighbors, as discussed in the text, it is self- 
evident that any TopD server can be part of one or more groups (which can change over time). 
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21 . Furthermore, this aspect is discussed in Exhibit A, page 5: 

Locality of all algorithms: 

each node have information only about some adjacent neighbors it interested 
in and never have any global tables 

22. Therefore, as the above demonstrates, the invention of claim 33 was conceived before 
the earliest priority date of the cited references. 

23. Claim 45 is reproduced below: 

45. A method of accessing files in a distributed file storage system 
comprising: 

dividing a plurality of servers into a plurality of groups; 

supporting file access services on each of the servers for accessing a file 
stored on the servers; 

at each server, maintaining a list of neighbor servers that belong to the 
same group; 

generating a plurality of pieces from a file to be stored; and 

distributing the plurality of pieces to the neighbor servers in the same 
group in order to achieve a desired fault tolerance level. 

24. The aspect of "distributing the plurality of pieces to the neighbor servers in the same 
group in order to achieve a desired fault tolerance level" is discussed Exhibit A, page 9, section 
6, entitled "Regulated redundancy and versioning," which is also quoted above. As noted above, 
(N,K) algorithms can be used for dividing the file into N pieces, such that any K pieces are 
sufficient to recover the file. It should be understood that when (N,K) algorithms are used, the N 
pieces are not the result of a simple "breakup" of the file into smaller pieces — the pieces are 
derived from the original file contents using an (N,K) algorithm, but they are not mere sub- 
portions of the file. 

25. Thus, a "desired fault tolerance level," in this context, refers to how many of the N 
servers/nodes can fail (the number "M" in the discussion at page 9, section 6, also quoted 
below), before the file becomes unrecoverable: 

Typically file divided into at least N = (K+M) pieces, where M - amount of the nodes 
which could disappear simultaneously. Each block is stored on a separate remote 
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node. There are K+M individual nodes with block pieces and the removal of M of 
them still gives an ability to restore block. 

26. The remaining aspects of claim 45 have been addressed above, with reference to 
claims 26 and 33. 

27. Therefore, as the above demonstrates, the invention of claim 45 was conceived before 
the earliest priority date of the cited references. 

28. Claim 57 is reproduced below: 

57. A method of naming files in a distributed file storage system 
comprising: 

dividing a plurality of servers into a plurality of groups such that each 
server belongs to at least one group; 

supporting file access services on each of the servers for accessing files of 
the distributed file storage system; 

giving file names for the files uniformly within the distributed file storage 
system independent of location of the files on the servers; 

storing the files in the distributed file storage system using the names; and 

accessing the files using the file access services from any of servers. 

29. The aspect of file names in the context of the invention of claim 57 is discussed in 
numerous places in Exhibit A. For example, the following passages on page 3 discuss this aspect 
(bold is added): 

Considering Linux as a platform for ASP we could mention, that all 
implementation of local FS do not have the following set of ASP platform 
requirements: 

- distributed access 

* hierarchical uniform naming space 

* high performance (fast access to any file in cluster) 

* scalability support of 100s computers in network 

* consistency journaling support and transactional features, including fast faults 
recovery 
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• fault tolerance - any computer could be switched off without loosing of 
access to any data in file system; any network connection could disappear 

30. File names are also disclosed in the context of page 4 of Exhibit A: 

Taking into account current features of Linux FS we could define the following 
set of data storage server requirements: 

• easy mapping to semantics of local FS on Linux - ASPdS should operate 
by the same concepts than local FS (at least, have superset of it implemented) 

• easy integration of it with local FS - localFS have to obtain a new features 
by calling some ASPdS services, and have to do it effectively 

• distributed access to files (any node in cluster could access it) 

3 1 . The following passage on page 5 is also relevant to confirming the conception of this 
aspect and in particular, to the element "giving file names for the files uniformly within the 
distributed file storage system independent of location of the files on the servers": 

In such a implementation we use a local FS as a file cache for global data 
storage. 

32. The aspect of "storing the files in the distributed file storage system using the names" 
is discussed, for example, at page 5 of Exhibit A (native Linux file systems obviously use file 
names): 

each file stored on a local filesystem, application accesses data via a well 
balanced and robust implementations of native linux filesystems 

33. Other aspects of this claim have been discussed previously with regard to claims 26, 
33 and 45, and in the interest of avoiding redundancy, will not be repeated here. 

34. Therefore, as the above demonstrates, the invention of claim 57 was conceived before 
the earliest priority date of the cited references. 

35. Claim 66 is reproduced below: 

66. A system for organizing distributed file storage comprising: 

a plurality of functionally equivalent servers each providing file access 
services, for a plurality of clients, to files stored on the servers; and 
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the files being divided into a plurality of pieces stored on the plurality of 
servers, 

wherein information for reconstructing files stored on the distributed file 
system is obtained from the functionally equivalent servers. 

36. The aspect of "wherein information for reconstructing files stored on the distributed 
file system is obtained from the functionally equivalent servers" is discussed in Exhibit A in 
connection with the (N,K) algorithm, as noted earlier. The servers are functionally equivalent in 
the sense that any K out of the N servers can be used to reconstruct the file — there is nothing 
special about any one server compared to any other. This is to be contrasted with such structures 
as RAIDs, where one of the servers (or drives) of the RAID represents parity or error 
detection/correction information, and others represent actual data — in this sense, the RAID 
"servers" are not functionally equivalent, since one of the servers (or drives) stores parity/EDAC 
data. Note that there is no separate server dedicated to storing catalogs and file descriptions. 

37. The remaining aspects recited in this claim have been addressed above, and in the 
interest of avoiding redundancy, will not be repeated here. 

38. Therefore, as the above demonstrates, the invention of claim 66 was conceived before 
the earliest priority date of the cited references. 

39. Claim 69 is a computer program product counterpart of 33, and all of the aspects 
recited in claim 69 have been addressed above. 

40. Thus, Exhibits A, D/E and F confirm that the invention as recited in all the 
independent claims was conceived prior to December 14, 2000, the earliest filing date of the 
cited references. 

41 . Exhibits B and C, also submitted previously, are files from SWsoft repository adapted 
for WinCVS, used by software developers for storing and using versions of the source code, and 
also for generating logs of subsequent revisions. The attached files are related to the 
implementation of at least the invention of the independent claims. 

42. CVS - Concurrent Versions System (see, for example 
http://www.skolelinux.org/portal/contribute/cvs intro/document view ) is a system that keeps 
track of changes to files in the software projects. It creates a record of what, why and by whom a 
file was modified. Exhibits B and C show, for indicated time and dates, changes relating to 
network connectivity code, including modification system parameters, adaptation of program 
modules to program shell and associated program modules and bug fixing. 
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43. As shown in file "topd_spec" (Exhibit D/E) and in the figure of Exhibit F, the 
software is a topological server that maintains network coherency, monitors network topology 
and maintains neighbors for hardware nodes. All interactions between nodes are implemented 
by means of topological servers. 

44. Files "config.h,v" (Exhibit B) and "fserv.cc,v" (Exhibit C) collectively represent a 
partial record of some of the work that was directed to actual reduction to practice of the 
invention. These two documents (which are two out of numerous files associated with this 
project) show the development process of an embodiment of the invention. These files reflect 
only a small portion of the actual work on a project of this magnitude, and typically reflect 
completion of a task that itself can take days or weeks to complete. 

45. Exhibits B and C confirm work on actual reduction to practice in the period between 
November 30, 2000 through June 19, 2001 ("fserv.cc,v" file) and between February 15, 2001 
through October 21, 2002 ("config.h,v" file). It should be noted that an entry in one of these 
files reflects work on certain project modules as well as works (and entries) in many other files 
being integral and substantial parts of the whole software code. 

46. Exhibit B, a file entitled "config.h,v," is a record of work on the project relating to 
TopD (Topological module, i.e., the clustered distributed file system discussed in Exhibit A) by 
Ruslan Iljin ("author media"), an employee (in 2001) of SWsoft, the Assignee of this application 
and Kirill Korotayev ("author dev"), another employee of the Assignee of this application, 
between 02/15/2001 and 10/21/2002. 

47. The config.h,v file thus represents evidence of work directed to actual reduction to 
practice of the invention. Of relevance here is the time period 02/15/2001 through 07/30/2001, 
as reflected in the file. Some of entries from the config.h,v file have been reproduced below. 
Each entry has a number (e.g. 1.1, 1.2, 1.3, etc.), a date (highlighted for the Examiner's 
convenience) and a description of the work (in italic), which is copied to the list below from 
pages 2-6 of the config.h,v file. Commentary explaining the entries and their relevance have 
been added in [bold]: 

1.12 [MODIFYING DEFAULT VALUES OF VARIABLES USED FOR 
NETWORK DISTANCE MONITORING, REQUIRED FOR NEIGHBOUR 
SERVERS DEFINITION, MODIFYING NETWORK READ AND WRITE 
REQUESTS EXECUTION, MOIFYING PROCEDURES FOR SEARCHING AND 
RETRIEVING SLICES/CHUNKS OF THE FILES STORED ON THE 
DISTRIBUTED STORAGE), SEE, E.G., CLAIM 26 ("a list of neighbor servers 
maintained by each server, wherein the list is used to obtain information for 
reconstructing files stored on the distributed file system") ] 
date 2001.06.06.09.29. 1 8; author media; state Exp; 
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branches; 
[log 

@*** empty log message *** 

@ 
text 

@d61 
a61 

* $Id: config.h,v 1.11 2001/05/14 13:48:45 media Exp $ 
d361] 

1.11 [UPDATING VARIABLE FOR NETWORK DISTANCE CALCULATION, A 
LOT OF BUGS IN ASSOCIATED FILE LISTS FIXED (MODIFICATION OF 
PROCEDURE OF DETECTING ACTIVE TRANSMISSION CHANNELS), SEE, 
E.G., CLAIM 26 ("a list of neighbor servers maintained by each server, wherein the 
list is used to obtain information for reconstructing files stored on the distributed 
file system")] 

date 2001.05.14. 1 3.48.45; author media; state Exp; 
branches; 

[ @fixed many bugs in lists 

@ 
text 

@d61 

a61 

* $Id: config.kv 1.10 2001/05/02 12:12:18 dev Exp $ 
d56 2] 



1.9 [ ADDING TOPD CONFIGURATION DATA TO SUPPORT FILE ACCESS 
SERVICES), SEE, E.G., CLAIM 26 ("a list of neighbor servers maintained by each 
server, wherein the list is used to obtain information for reconstructing files stored 
on the distributed file system"), CLAIM 57 ("supporting file access services on each 
of the servers for accessing files of the distributed file storage system")] 
date 2001.04.13.09.23.57; author media; state Exp; 
branches; 

[@remove trailing spaces 

@ 

text 

@d21 

a2 1 

* Copyright (C)SWSoft 2000-2001 
d4 3 
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a61 

* $Id: config.h,v 1.8 2001/04/12 09:51:01 media Exp $ 

a8 2 
* 

* A uthor: Ruslan Iljin <media@@www. rt. mipt. ru>] 

1.8 [MODIFICATION TIMEOUT CONSTANTS TO IMPROVE STABILITY 
DURING NETWORK STORAGE ACESS, I.E., DURING ACCESS OF THE FILE 
PIECES STORED ON THE SERVERS (MODIFICATION OF PROCEDURE 
REQUIRED FOR NETWORK TOPOLOGY MONITORING, IN PARTICULAR, 
FOR MONITORING NEIGHBOUR SERVERS FOR PLURALITY OF SERVERS, 
IN INTERMEDIATE VERSION - MODIFYING NEIGHBOUR SERVERS LIST 
IN DEPENDENCE OF NETWORK CONDITIONS), SEE, E.G., CLAIM 26 ("a list 
of neighbor servers maintained by each server, wherein the list is used to obtain 
information for reconstructing files stored on the distributed file system"), SEE 
ALSO CLAIM 45 ("distributing the plurality of pieces to the neighbor servers in the 
same group in order to achieve a desired fault tolerance level")] 
date 2001.04.12.09.51.01; author media; state Exp; 
branches; 
[log 

@header files description 

@ 
text 
@d41 
a41 

* $Id: config.hv 1.41 2001/04/12 08:26:34 media Exp $ 
dll 1 

all 1] 

1.7 [BUG FIXING, DEFINE DEFAULT TIMEOUTS TO AVOID STARVATION 
WHILE RETRIEVING FILE DATA AND FILE SYSTEM DATA, CODE 
REVIEW (MODIFICATION OF PROCEDURE REQUIRED FOR NETWORK 
TOPOLOGY MONITORING, IN PARTICULAR, FOR MONITORING 
NEIGHBOUR SERVERS FOR A PLURALITY OF SERVERS, IN 
INTERMEDIATE VERSION - MODIFYING NEIGHBOUR SERVERS LIST 
DEPENDING ON NETWORK CONDITIONS), SEE CLAIM 57 ("supporting file 
access services on each of the servers for accessing files of the distributed file storage 
system")] 

date 2001.04.10.14.55.41; author media; state Exp; 

branches; 

[log 
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@code review 

@ 
text 

@dl 11 

d391 

a391 

/* timeouts */ 
d44 2 
a45 2 

DECLARE int 
DECLARE int 
d47 2 
a48 2 

DECLARE int 
DECLARE int 
d50 2 
a51 2 

DECLARE int 
DECLARE int 
d53 1 
a53 1 

DECLARE int 



TOPDTIMEO UT_ WRITE STA GE l DEFA ULT(20); 
TOPDTIMEO UT_ WRITE ST A GE_2 DEFA ULT(20); 



TOP D TIMEOUT READ STAGE 1 DEFA ULT(60); 
TOPD TIMEOUT _READ_ST AGE J DEFA ULT(60); 



TOPD_TIMEOUT_SNAP_STAGE_l DEFA ULT(15); 
TOPD JIMEOUT_SNAP_ST AGE _2 DEFA ULT(20); 



TOPD TIMEOUT _DIRREC_ST AGE J DEFA ULT(15);J 



1.6 [MODIFICATION OF ALGORITHM FOR DEFINING NEIGHBOR 
SERVERS AND COMPILATION OF DYNAMICALLY CHANGED TABLES OF 
NEIGHBOR SERVERS, (MODIFICATION PROCEDURES RELATED TO 
ACTIVE CHANNELS MONITORING), SEE, E.G., CLAIM 26 ("a list of neighbor 
servers maintained by each server, wherein the list is used to obtain information for 
reconstructing files stored on the distributed file system")] 
date 2001.03.31.14.24.19; author media; state Exp; 
branches; 
[log 

@changes in dyn tables 

@ 

text 

@d2817 
a441 

DECLARE int TOPD TIMEOUT RECONNECT DEFA ULT(30);J 



1.5 [MODIFICATION OF NETWORK DISTANCES MONITORING 
FUNCTIONS FOR NEIGHBOR SEVERS DEFINITION (LIST OF NEIGHBOR 
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SERVERS), MODIFICATION OF FILE SLICES/CHUNKS RETRIEVAL 

ALGORITHM), SEE, E.G., CLAIM 26 ("a list of neighbor servers maintained by 

each server, wherein the list is used to obtain information for reconstructing files 

stored on the distributed file system") ] 

date 2001.03.30.10.31.07; author media; state Exp; 

branches; 

[@changes for new analize functions 

@ 
text 

@d34 3 
a36 3 

Mefxne _TOPD_READ_ALL_SLICES_ 0 /* read all available slices? if 0 read only 
required number of slices */ 

Mefxne TOP D REMOVE _AFTER_READ_ 0 /* remove snap entry after reading? */ 
#define TOP D LOOP PING _ 0 /* continue pinging neighbours for dynamic 

table? */] 

1.4 [DECLARATION OF NEW VARIABLES AND FUNCTIONS, 
MODIFICATION OF DEFAULT TIMEOUTS TO IMPROVE STABILITY 
WHILE RETRIEVING FILE DATA AND FILE SYSTEM DATA), SEE, E.G., 
CLAIM 26 ("a list of neighbor servers maintained by each server, wherein the list is 
used to obtain information for reconstructing files stored on the distributed file 
system") ] 

date 2001.03.22.09. 1 4. 1 3; author media; state Exp; 
branches; 

[@*** empty log message *** 

@ 
text 

@dl 2 

all 

Mfndef _CONFIG_H_ 
Mefine _CONFIG_H_ 
d33 4] 

1.3 [MODIFICATION OF PROCEDURE OF POLLING NETWORK SERVERS 
(NODES) WHILE RETRIEVING FILE DATA AND FILE SYSTEM DATA, 
DECLARATION OF NEW VARIABLES, SEE, E.G., CLAIM 26 ("a list of 
neighbor servers maintained by each server, wherein the list is used to obtain 
information for reconstructing files stored on the distributed file system"), CLAIM 
66 ("wherein information for reconstructing files stored on the distributed file 
system is obtained from the functionally equivalent servers")] 
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date 2001.03.12. 11.1 8.26; author media; state Exp; 

branches; 

next 1.2; 

1.2 [better to delete it since ground is week] [DISCLAIMING TIMEOUT DEFINING 

VARIABLES, AND ALSO VARIABLES REQUIRED FOR RETREIVING 

CHUNKS OF THE FILES (IN AN EARLIER VERSION, IMPROVING 

STABILITY DURING PROCESSING OF DIRECTORY RECORDS), SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2001.02.26.12.15.1 1; author media; state Exp; 

branches; 

next 1.1; 

1.1 [THE INITIAL VERSION OF CONFIGURING PROCEDURE FOR 
CONFIGURING SERVERS/NODES OF THE NEIGHBOR LIST, SEE, E.G., 
CLAIM 26 ("a list of neighbor servers maintained by each server, wherein the list is 
used to obtain information for reconstructing files stored on the distributed file 
system"), CLAIM 66 ("wherein information for reconstructing files stored on the 
distributed file system is obtained from the functionally equivalent servers")] 

date 2001.02.15.14.04.50; author media; state Exp; 

branches; 

[log 

@ * * * empty log message * * * 

@ 

text 

@a22 12 

DECLARE char* TopdAddress DEFAULT("1 27.0.0.1"); 

DECLARE char* WorkDir DEFAULT("~/clroot/"); 
DECLARE char* TmpDir DEFA ULT("/tmp/"); 

DECLARE int PutFileMaxBloxPerStep DEFAULT(IOO); 
DECLARE int GetFileMaxBloxPerStep DEFA ULT(100); 

DECLARE int TimeOutPutOBlock DEFAULT(5); 
DECLARE int TimeOutGetSnap DEFAULT(5); 
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DECLARE int TimeOutTopdReconnect DEFAULT(5);J 

48. Exhibit C, a file entitled "fserv.cc,v" is a record of work on the project relating to the 
file server ("fserv") by Ruslan Iljin ("author media"). This file is a part of overall project 
and shows a log of changes, including date and author of changes, a final version of the 
source code, and the substance of the changes made during development in the "fserv.cc" 
module, as well as a schedule of project revisions, since some modification of the file 
header are forced by modification of associated modules. 

49. Program module fserv.cc implements a program interface for file-related data request 
execution, and also for configuration and maintenance of the connections to the neighbor 
nodes. This code is required for file server interface, and therefore represents a portion of 
one embodiment of the invention. 

50. As shown in the comments embedded to the program code, fserv.cc contains the 
interface to the file server and also implements the following functions: 

Sending packets to client channel, 

Send snap, read, dirrec, write request to fserv, where fserv is a node, required for 
data collection; 

Processing snap reply from fserv; 
Processing read reply from fserv 
Processing dirrecord reply from fserv 
Processing blockO reply from fserv. 

5 1 . Each such file server "fserv" is the node (server) on which a file piece of the file is 
stored. The fserv.cc,v file therefore also represents evidence of work directed to actual reduction 
to practice of the invention. 

52. Of relevance here are entries from the time period from 11/30/2000 through 
6/19/2001 . Some of these entries have been reproduced below, in similar format as above: 

1.34 [MODIFYING PROCEDURE OF DETECTING NEIGHBOR SERVERS 
(ADDED CODE WHERE HOPS ARE USED FOR DETECTION OF NETWORK 
DISTANCE), SEE CLAIM 66 ("wherein information for reconstructing files stored 
on the distributed file system is obtained from the functionally equivalent servers")] 

date 2001.06.19.13.15.53; author media; state Exp; 

branches; 

flog 

@*** empty log message *** 
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@ 
text 
@d61 
a61 

* $Id: fserv.ccv 1.32 2001/06/01 12:22:23 media Exp $ 
al61 1] 

1.33 [ADDING TRACING LINES, MODIFYING NETWORK READ AND 
WRITE REQUEST EXECUTION FOR STORAGE OF FILE PIECES/CHUNKS 
ON THE SERVERS, SEE, E.G., CLAIM 26 ("a plurality of servers providing, to a 
plurality of clients, file access services for accessing files stored on the plurality of 
servers; wherein the files are divided into a plurality of pieces stored on the plurality 
of servers, and wherein the list is used to obtain information for reconstructing files 
stored on the distributed file system"), SEE ALSO CLAIM 33 ("supporting file 
access services on each of the servers; dividing a file into a plurality of pieces that 
are derived from the file; and storing each of the pieces of each file on the servers 
selected from the list")] 

date 2001.06.06.09.29.18; author media; state Exp; 

branches; 

[log 

@*** empty log message *** 

@ 

text 

@d61 

a61 

* $ld: fserv.ccv 1.31 2001/06/01 10:22:34 media Exp $ 
d561 

a561 

tfsPktType p_type[6]={ 
d601 
d701 
d91 1 
a91 3 

* parameters : reqid - RequestID of snap request 

* buff - address of data buffer 

* length - length of data buffer 
dllOl 

allO 3 

* parameters : reqid - RequestID of snap request 

* buff - address of data buffer 

* length - length of data buffer 
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dl291 
al29 3 

* parameters : reqid - RequestID of snap request 

* buff - address of data buffer 

* length - length of data buffer 
dl4l 19] 

1 .32 [MODIFYING NETWORK READ AND WRITE REQUESTS EXECUTION, 

MOIFYING PROCEDURES FOR SEARCHING AND RETRIEVING SLICES 

/CHUNKS OF THE FILES STORED ON THE DISTRIBUTED STORAGE, SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2001.06.01.12.22.23; author media; state Exp; 

branches; 

flog 

@*** empty log message *** 

@ 
text 

@d61 

a61 

* $Id: fserv.cc.v 1.31 2001/06/01 10:22:34 media Exp $ 
d561 

a561 

tfsPktType p_type[6]={ 
d601 
d701 
d91 1 
a91 3 

* parameters : reqid - RequestID of snap request 

* buff - address of data buffer 

* length - length of data buffer 
dllOl 

allO 3 

* parameters : reqid - RequestID of snap request 

* buff - address of data buffer 

* length - length of data buffer 
dl291 

al29 3 

* parameters : reqid - RequestID of snap request 
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* buff - address of data buffer 

* length - length of data buffer 
dl41 19] 



1.31 [MODIFYING NEIGHBOR SERVERS HANDLING ALGORITHM IN CASE 

OF LACK OF AVAILABLE MEMORY, MODIFYING ALGORITHM OF 

MODIFICATION OF LIST OF NEIGHBORS, SEE CLAIM 66 ("wherein 

information for reconstructing files stored on the distributed file system is obtained 

from the functionally equivalent servers")] 

date 2001.06.01.10.22.34; author media; state Exp; 

branches; 

[log 

@ * * * empty log message * * * 

@ 

text 

@d61 

a61 

* $Id: fserv.cc,v 1.30 2001/05/02 12:12:18 dev Exp $ 
d95 1 
a95 1 

int tfsTopd::fserv_search_reply(tfsRequestID reqid, char* buff, int length) 
dlOl 2 
a!02 2 

if (header _get hops number (buff length)) rc=initiate_search_reply (reqid, buff length, 
TOPD SLICE SEARCH REPLY); 

else rc=search_reply (reqid, get neighbour number ■(myself), buff, length, 
TOPD SLICE SEARCH REPLY); 
dll61 
all61 

int tfsTopd::fserv read_reply(tfsRequestID reqid, char* buff int length) 
dl22 2 
al23 2 

if (header _get hops number (buff length)) rc=initiate search _reply (reqid, buff, length, 
TOPD SLICE READ REPLY); 

else rc=search reply (reqid, get neighbour _number -(myself), buff, length, 
TOPD SLICE READ REPLY); 
dl37 1 
al37 1 

int tfsTopd::fserv_dirrecord_reply(tfsRequestID reqid, char* buff, int length) 
dl43 2 
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al44 2 

if (header _get hops number (buff, length)) reinitiate sear ch_reply(reqid, buff length, 
TOPDDIRRECSEARCHREPL Y); 

else rc=searchjreply(reqid, get neighbour number (myself), buff length, 
TOP D DIRREC SEARCH REPL Y);J 

1.30 [REVIEWING CODE IN ASSOCIATED MODULES, DEBUG FUNCTIONS 
USED INSTEAD OF SIMPLE LOGGING OF PROGRAMM EXECUTION FOR 
APPROACHING FULL FUNCTIONALITY OF DEVELOPED MODULES, A 
UTILITY NECESSARY FOR OVERALL FUNCTIONALITY OF THE 
PROJECT] 

date 2001.05.02.12.12.18; author dev; state Exp; 

branches; 

flog 

@Global code review. 
TFS assert -> ASSERT 

removed all tfsLog functions. debugO is used instead 
SWSoft -> SWsoft :) 

@ 
text 

@d6J 

a61 

* $Id: fserv.ccv 1.29 2001/04/25 16:18:56 media Exp $ 
d79 2] 



1.28 [INTERFACE TO FILE SERVER MODIFYING, SEARCHIN AVAILABLE 
NEIGHBOUR SERVERS, JOINING SLICES/CHUNKS BACK INTO THE FILE 
AFTER READ REQUEST EXECUTION), SEE CLAIM 57 ("supporting file access 
services on each of the servers for accessing files of the distributed file storage 
system"), CLAIM 66 ("wherein information for reconstructing files stored on the 
distributed file system is obtained from the functionally equivalent servers")] 
date 2001.04.20.10.46.27; author media; state Exp; 
branches; 
flog 

@join read request 

@ 
text 
@d41 
a4 1 

* $Id: fserv.ccv 1.27 2001/04/20 10:26:04 media Exp S 
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dlOOl 
alOOl 

else rc=search_reply(reqid, get neighbour number (myself), buff, length, 
TOPD DIRREC SEARCH REPLY);] 

1.27 [MODIFYING TRACING LINES, MODIFYING CODE RELATED TO 
DIRECTORY RECORDS AND SLICE/CHUNK ACCESS, SEARCH REQUEST 
FINAL IMPLEMENTATION, SEE CLAIM 57 ("supporting file access services on 
each of the servers for accessing files of the distributed file storage system"), 
CLAIM 66 ("wherein information for reconstructing files stored on the distributed 
file system is obtained from the functionally equivalent servers")] 
date 2001.04.20.10.26.04; author media; state Exp; 
branches; 

[@final search request imple.entation 

@ 
text 

@441 

a41 

* $Id: fserv.cc.v 1.26 2001/04/20 09:23:00 media Exp $ 
dl20 2 

al21 2 

if (header jget hops number (buff length)) reinitiate slice transfer reply (reqid, buff, 
length); 

else rc=slice_transfer_reply(reqid, get neighbour number (myself), buff, length);] 

1 .26 [ MODIFYING SEARCH ALGORITHM OF NEIGHBOR SERVERS FOR 
DATA TO BE TRANSFERRED, SEE, E.G., CLAIM 26 ("a list of neighbor servers 
maintained by each server, wherein the list is used to obtain information for 
reconstructing files stored on the distributed file system")] 
date 2001.04.20.09.23.00; author media; state Exp; 
branches; 

[@join search proc 

@ 
text 

@d41 

a41 

* $Id: fserv.cc.v 1.25 2001/04/19 11:10:46 media Exp $ 
dlOOl 

alOOl 

else rc=slice_search_reply(reqid, get jieighbour jmmber (myself), buff length); 
d!42 1 
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al42 1 

else rc=dirrecordjeply(reqid, get _neighbour number (myself), buff length);] 

1.24 [MODIFICATION OF DIRECTORY PROCESSING WHILE RETRIEVING 
AND JOINING SLICES OF FILES STORED ON THE NETWORKED SERVERS, 
SEE CLAIM 57 ("supporting file access services on each of the servers for accessing 
files of the distributed file storage system"), CLAIM 66 ("wherein information for 
reconstructing files stored on the distributed file system is obtained from the 
functionally equivalent servers")] 
date 2001.04.19.08.22.10; author media; state Exp; 
branches; 

[ @remove neighbour address attachment 

@ 

text 

@d41 

a41 

* $Id: fserv.cc.v 1.23 2001/04/13 09:23:57 media Exp $ 
d66 6 
all 6 

if (req_type==SNAP REQUEST) PRINTD2(" send snap request to fserv"); 
if (reqtype = = READ REQUEST) PRINTD2("send read request to fserv"); 
if (reqjype = =DIRREC REQUEST) PRINTD2("send directory record list request to 
fserv"); 

if (req type==SLlCE_WRITE) PRINTD2 ("send slice write request to fserv"); 
if (req_type==ZEROBLOCK WRITE) PRINTD2("send zeroblock write request to 
fserv"); 

if (req _type==DIRREC WRITE) PRINTD2 ("send directory record write request to 
fserv");] 

1.22 [ IMPROVE STABILITY OF DATA TRANSMISSION BETWEEN 
SERVERS, MODIFYING DEBUGGING LINES FOR FIXING PROBLEMS 
WITH SERVERS' STORAGE CAPABILITIES, SEE CLAIM 45 ("distributing the 
plurality of pieces to the neighbor servers in the same group in order to achieve a 
desired fault tolerance level")] 

date 2001.04.12.11.37.14; author media; state Exp; 
branches; 

[@change no memory debug level 

@ 
text 
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@d41 
a41 

* Sid: fserv.ccv 1.21 2001/04/12 11:35:27 media Exp $ 
d77 1 
a77 1 

}] 



1.16 [DEBUGGING LINES MODIFICATION, MODIFYING PROCESSING OF 

RETRIEVING AND PROCESSING IDENTIFIERS OF SLICES/CHUNKS OF 

THE FILES THAT HAVE BEEN DISTRIBUTED TO THE SERVERS, SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2001.04.04.10.33.33; author media; state Exp; 

branches; 

[ @code review 

@ 
text 
@d41 
a41 

* A uthor: Ruslan Iljin < media@@www. rt. mipt.ru> 
d61 

a61 

* Sid: fserv.ccv 1.15 2001/03/30 12:43:36 media Exp $ 
d81 

a8 6 
*/ 

/* 

* fserv.cc 
* 

* This file contains interface to file server 
dl4 8 

d281 
a281 

PRINTD2("fileserver_send: there is no fservW); 
d341 
a341 

PRINTD2("Packet sent to fservW); 
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d41 1 

a41 1 

int tfsTopd::fserv_write_request(tfsRequestID reqid, char* buff, int length, int reqjype) 

d44 7 

a501 

tfsPktType p_type[3]={TFS_MSG J2F ZEROBLOCK WRITE REQ, 
TFS_MSG_T2F_SLICE_WRITE_REQ, TFS_MSG_T2F_DIRRECORD_WRITE_REQ}; 
d53 6 
a58 3 

if (req_type==SLICE WRITE) PRINTD2 ("send slice write request tofservW); 
if(req_type==ZEROBLOCK_WRITE) PRINTD2 ("send zeroblock write request to 
fservW); 

if (req_type==DIRREC_WRITE) PRimD2(" send directory record write request to 
fservW); 
a63 22 

int tfsTopd::fserv_search_request(tfsRequestID reqid, char* buff, int length) 
{ 

int rc; 

INFUNCftfs Topd: :fserv_search_request) ; 
PRINTD2("send slice search request to fservW); 

tfsPKT *p = new tfsPKT(TFS MSG T2F SNAP _REQ, length, buff f reqid); 

rc =fserv_send(p); 

OUTFUNCINT(rc); 

} 

int tfsTopd::fserv dirrecord_request(tfsRequestID reqid, char* buff int length) 

{ 

int rc; 

INFUNC( tfsTopd: .fservdirrecordrequest); 
PRINTD2("send dirrec request to fservW); 

tfsPKT *p = new tfsPKT(TFS MSG T2F DIRRECORD LIST REQ, length, buff, reqid); 

rc =fserv_send(p); 

OUTFUNCINT(rc); 

} 

d691 
a691 

PRINTD2("got slice search reply from fservW); 
d801 
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a801 

PRINTD2("got dirrec reply from fservW); 
a85 11 

int tfsTopd: :fserv read_request(tfsRequestID reqid, char* buff, int length) 
f 

int rc; 

INFUNC(tfsTopd::fserv_read_request); 
PRINTD2("send slice read request to fservW); 

tfsPKT *p = new tfsPKT(TFS_MSG_T2F SLICE READ REQ, length, buff reqid); 

rc=fserv_send(p); 

OUTFUNCINTfrc); 

} 

d91 1 
a91 1 

PRINTD2("got slice read reply from fservW);] 

1.15 [DEBUGGING PROCESS MODIFICATION, MODIFYING PROCEDURES 
OF CHANNEL OPENING AND CONNECTING VIA OPENED CHANNEL 
WHILE PROCESSING PACKETS OF DATA BETWEEN SERVERS (NODES) 
OF THE DISTRIBUTED STORAGE (MODIFICATION OF PROCEDURE OF 
RETRIEVING CHUNKS FROM NEIGHBOR SERVERS AND SEARCHING 
SERVERS CONTAINING DATA RELATED TO CHUNKS OF FILES, SEE, E.G., 
CLAIM 26 ("a list of neighbor servers maintained by each server, wherein the list is 
used to obtain information for reconstructing files stored on the distributed file 
system")] 

date 2001.03.30.12.43.36; author media; state Exp; 
branches; 

[@removed tabs spaces 

@ 
text 

@d41 

a41 

* Author: Ruslan Iljin <media@@www.rt.mipt.ru> 
d61 

a61 

* $Id: fserv.ccv 1.14 2001/03/23 15:29:23 sinw Exp $ 
dl0 2 

all 2 

/* 
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*fserv.cc 
d23 1 
d261 
a261 

return -1; 
d29 2 
a30 2 

if ((C Jserv->state == tfsChannel::CONNECTED)\\(CJserv-> state == 
tfsChannel: : OPENED) ) 

{ 

d34 2 
a35 2 

} 

return count; 
d401 
d42 2 
d47 3 
a49 2 

tfsPKT *p = new tfsPKT(p_type[req_type],length,buff,reqid); 
fserv_send(p); 
d51 1 
a51 1 



d54 7 
a60 3 

PRINTD2("send slice search request to fserv\n"); 

tfsPKT *p = new tfsPKT(TFS_MSG_T2F_SNAP_REQ,length,buff,reqid); 

fserv_send(p); 
d65 7 
all 3 

PRINTD2("send dirrec request to fserv\n"); 

tfsPKT *p = new tfsPKT(TFS_MSG_T2F_DIRRECORD_LIST_REQ, length, buffreqid); 

fserv_send(p); 
d76 7 
a82 3 

PRINTD2("got slice search reply from fservW); 

if (header jget hops number (buff, length)) initiate _slice_search_reply(reqid, buff, 
length); 

else slice _searchjreply(reqid, myself-> Address, buff, length); 
d87 7 
a93 3 
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PRINTD2("got dirrec reply from fservW); 

if (header _get hops number (buff, length)) initiate dirrecord reply (reqid, buff, length); 

else dirrecordjeply (reqid, myself-> Address, buff, length); 
d98 7 
al04 3 

PRINTD2("send slice read request to fservW); 

tfsPKT *p = new tfsPKT (TFS MSG T2F SLICE READ _REQ,length,buff reqid); 

fserv_send(p); 
dl09 7 
all5 3 

PRINTD2("got slice read reply from fservW); 

if (header _get hops number (buff length)) initiate _slice_transfer reply (reqid, buff, 
length); 

else slice transfer _reply (reqid, myself-> Address, buff, length);] 

1.14 [FILE HEADER AND COPYRIGHTS MODIFICATION (MODIFYING 
PROCEDURE OF NEIGHBOR SERVERS TABLE CORRECTION, ALSO 
MODIFYING ALGORITHM OF SEARCHING NEIGHBOR SERVER 
CONTAINING REQUIRED CHUNK, IMPROVE STABILITY OF NETWORK 
CONNECTIONS), SEE CLAIM 57 ("supporting file access services on each of the 
servers for accessing files of the distributed file storage system"), CLAIM 66 
("wherein information for reconstructing files stored on the distributed file system 
is obtained from the functionally equivalent servers")] 
date 2001.03.23.15.29.23; author sinvv; state Exp; 
branches; 

[@Change copyright sinw@@ 

@ 
text 

@d6 1 

a6 1 

* $Id: /home/cvs/aspfs/topd/fserv.cc,v 1.13 2001/02/23 11:15:18 media Exp $ 
d24 2 
a25 2 

PRINTD2("fileserver_send: there is no fservW); 
return -1; 

d29 5 
a33 5 

{ 

PRINTD2("Packet sent to fservW); 

C_fserv->PutQ(p); 

count++; 
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}] 



1.13 [DEBUGGING PROCESS MODIFICATION, MODIFYING SLICES/CHUNK 
SEARCHING PROCEDURE FOR SEARCHING FOR FILE CHUNKS STORED 
ON THE SERVERS, MODIFYUNG REQUESTS PROCESSING, 
MODIFICATION ALGORITHM OF FILE SYSTEM DATA PROCESSING), SEE, 
E.G., CLAIM 57 ("dividing a plurality of servers into a plurality of groups such that 
each server belongs to at least one group; supporting file access services on each of 
the servers for accessing files of the distributed file storage system; giving file names 
for the files uniformly within the distributed file storage system independent of 
location of the files on the servers; storing the files in the distributed file storage 
system using the names; and accessing the files using the file access services from 
any of servers")] 

date 2001.02.23.11.15.18; author media; state Exp; 

branches; 

log 

@*** empty log message *** 

@ 
text 
@d2 1 
d41 
a4 1 

* Copyright (C) SWSoft 1999-2000 
d61 

a6 6 

* A uthor; Ruslan Iljin 

* E-mail: media@@www. rt. mipt. ru 

* SHeader: /home/cvs/aspfs/topd/fserv.cc,v 1.12 2001/02/22 14:52:34 media Exp $ 
* 

* Last correct: 16/Nov/00 

@ 



1.12 [DEBUGGING PROCESS MODIFICATION, MODIFYING SLICES AND 
DIRECTORY RECORDS WRITING TO NEIGHBOR SERVERS, ALSO 
IMPROVEMENT IN NETWORK COMMUNICATIONS, SEE CLAIM 66 
("wherein information for reconstructing files stored on the distributed file system 
is obtained from the functionally equivalent servers")] 
date 2001.02.22.14.52.34; author media; state Exp; 
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branches; 
[log 

@*** empty log message *** 

@ 
text 
@d81 
a81 

* SHeader: /home/cvs/aspfs/topd/fserv.cc,v 1.11 2001/01/30 1 7:48:43 media Exp $ 
d58 1 

a581 

int tfsTopd: :fserv _params_request(tfsRequestID reqid, char* buff, int length) 
d60 2 
a61 2 

PRINTD2("send params request to fservW); 

tfsPKT *p = new tf$PKT(TFS_MSG T2F PARAMS REQ, length, buff, reqid); 
d72 1 
a72 1 

int tfsTopd: .fserv _params reply (tfsRequestID reqid, char* buff, int length) 
d74 3 
a76 3 

PRINTD2("got params reply from fservW); 

if (header _get_hops number (buff, length)) initiate jparams jeply (reqid, buff, length); 
else params reply (reqid, myself-> Address, buff, length);] 

1.11 [ADDING TRACING LINES FOR DEBUGGING, MODIFYING REQUESTS 
EXECUTION, MODIFYING SLICES PROCESSING PROCEDURE FOR 
ACCESSING FILE PIECES ON THE SERVERS (MODIFYING PROCEDURES 
PROCESSING REQUESTS FOR AVAILABLE SLICES AND PROCEDURES OF 
PROCESSING CORRESPONDING REPLIES, IN PREVIOUS VERSIONS - 
MODIFICATION OF ALGORITHM OF DETECTING CANDIDATES FOR 
TABLE OF NEIGHBOR SERVERS), SEE CLAIM 66 ("wherein information for 
reconstructing files stored on the distributed file system is obtained from the 
functionally equivalent servers")] 
date 2001.01.30.17.48.43; author media; state Exp; 
branches; 

[@*** empty log message *** 

@ 
text 

@d81 

a81 

* SHeader: /home/cvs/aspfs/topd/fserv.cc,v 1.10 2001/01/30 17:29:44 media Exp $ 
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d43 1 

a431 

tfsPktType p_type[2J={TFS_MSG_T2F_ZEROBLOCK_WRITE_REQ, 
TFS MSG_T2F_SLICE_WRITE_REQ}; 
d45 2 
a461 

else PRINTD2("send zeroblock write request to fserv\n");J 



1.9 [ADDING TRACING LINES FOR DEBUGGING, MODIFYING 
PROCEDURES OF CHANNEL OPENING AND CONNECTING VIA OPENED 
CHANNEL WHILE PROCESSING PACKETS OF DATA BETWEEN SERVERS 
(NODES) OF THE DISTRIBUTED STORAGE (MODIFICATION OF 
ALGORITHM OF NEIGHBOR SERVERS TABLE CORRECTION), SEE, E.G., 
CLAIM 26 ("a list of neighbor servers maintained by each server, wherein the list is 
used to obtain information for reconstructing files stored on the distributed file 
system")] 

date 2000.12.05.11.43.19; author media; state Exp; 

branches; 

next 1.8; 

1.8 [FILE HEADER CHANGING ASSOCIATED WITH OTHER MODULES 

MODIFICATION (MODIFICATION OF ALGORITHM OF CHOOSING 

NEIGHBOR SERVER FOR WRITING FILE CHUNKS) , SEE CLAIM 57 

("supporting file access services on each of the servers for accessing files of the 

distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2000.12.05.09.38.18; author media; state Exp; 

branches; 

next 1 .7; 



1.6 [MODIFICATION OF ROUTINES SUPPORTING SERVER CONNECTIONS 
(MODIFICATION OF PROCEDURE WRITING FILE CHUNKS TO NEIGHBOR 
SERVERS) , SEE CLAIM 57 ("supporting file access services on each of the 
servers for accessing files of the distributed file storage system"), CLAIM 66 
("wherein information for reconstructing files stored on the distributed file system 
is obtained from the functionally equivalent servers")] 
date 2000.11.30.15.11.41; author media; state Exp; 
branches; 
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next 1.5; 



1 A [INTERMEDIATE VERSION FOR NEIGHBOR SERVERS 

INTERCONNECTION, WHERE IMPLEMENTED MAIN FUNCTIONALITY OF 

SENDING AND RECEIVING DATA PACKETS, REQUIRED FOR DATA 

COLLECTION, INCLUDING FILE DATA AND DIRECTORY DATA , SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2000.11.30.13.01.11; author media; state Exp; 

branches; 

next 1.3; 

53. Thus, the invention was conceived prior to the filing dates of Boykin et al. and Lahr 
et al., and the inventors were working diligently on constructive and/or actual reduction to 
practice between the filing date of Boykin et al. and Lahr et al., and July 30, 2001, the filing date 
of this application. 

54. As the persons signing below, we hereby declare that all statements made herein of 
our own knowledge are true and that all statements made on information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under § 1001 of 
Title 18 of the United States Code, and that such willful false statements may jeopardize the 
validity of the application or any patent issue thereupon. 



Date Alexander Tormasov 



Date Mikhail Khassine 



Date Serguei Beloussov 



Date Stanislav Protassov 
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next 1.5; 



14 (INTERMEDIATE VERSION FOR NEIGHBOR SERVERS 

INTERCONNECTION, WHERE IMPLEMENTED MAIN FUNCTIONALITY OF 

SENDING AND RECEIVING DATA PACKETS, REQUIRED FOR DATA 

COLLECTION, INCLUDING FILE DATA AND DIRECTORY DATA , SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("whereia information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")! 

date 20B011JO.13.01.il; author media; state Exp; 

branches; 

next 1.3; 

53. Thus, the invention was conceived prior to the filing (fetes of Boykin et al. and Lahr 
et al, and the inventors were working diligently on constructive and/or actual reduction to 
practice between the filing date of Boykin et al. and Lahr et al, and Jury 30, 2001, the filing date 
of this application. 

54. As the persons signing below, we hereby declare that all statements made herein of 
our own knowledge are true and that all statements made cm information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under § 1001 of 
Title 1.8 of the United States Code, and that such willful false statements may jeopardize the 
validity of the application or any patent issue thereupon. 



Date 




♦Alexander Tprmasov 




Date 



Stanislav Protassov 
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next ) .5; 



1 4 [INTERMEDIATE VERSION FOR NEIGHBOR SERVERS 

INTERCONNECTION, WHERE IMPLEMENTED MAIN FUNCTIONALITY OF 

SENDING AND RECEIVING DATA PACKETS, REQUIRED FOR DATA 

COLLECTION, INCLUDING FILE DATA AND DIRECTORY DATA, SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers")] 

date 2000.11.30.13.01,11; author media; state Exp; 

branches; 

next 1 .3; 

53 Thus, the invention was conceived prior to ihe filing dates of Boykin et ai. and Lahi 
et aL. and the inventors were working diligently on constructive and/or actual reduction to 
practice between the filing date of Boykin et al. and Lahr et aL, and July 30, 2001, the filing date 
of this application. 

54 As the persons signing below, we hereby declare that all statements made herein of 
our own knowledge are true and that all statements made on information and belief are 

to be true; and further that these statements were made with the knowledge that wiUful talse 
statements and the like so made are punishable by fine or imprisonment, or both, under § 1 00 1 ot 
Title 18 of the United States Code, and that such willful false statements may jeopardize the 
validity of the application or any patent issue thereupon. 
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Date Mikhail Khassine 
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Mikhail Khassifl 



Date Serguei Beloussov 
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next 1.5; 

1 .4 [INTERMEDIATE VERSION FOR NEIGHBOR SERVERS 

INTERCONNECTION, WHERE IMPLEMENTED MAIN FUNCTIONALITY OF 

SENDING AND RECEIVING DATA PACKETS, REQUIRED FOR DATA 

COLLECTION, INCLUDING FILE DATA AND DIRECTORY DATA , SEE 

CLAIM 57 ("supporting file access services on each of the servers for accessing files 

of the distributed file storage system"), CLAIM 66 ("wherein information for 

reconstructing files stored on the distributed file system is obtained from the 

functionally equivalent servers*')] 

date 2000.1I.30.13.0U 1; author media; state Exp; 

branches; 

next 1 .3; 

53. Thus, the invention was conceived prior to the filing dates of Boykin et al. and Lahr 
et al, and the inventors were working diligently on constructive and/or actual reduction to 
practice between the filing date of Boykin et al. and Lahr et al., and July 30, 200 h the filing date 
of this application. 

54. As the persons signing below, we hereby declare that all statements made herein of 
our own knowledge are true and that all statements made on information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment or both, under § 1 00 1 of 
Title 18 of the United States Code, and that such willful false statements may jeopardize the 
validity of the application or any patent issue thereupon. 
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^Implementation Strategy 
14.Tblngs to be addressed 

1. Purpose 

The purpose of this document is to provide short technical 
overview of RCDFS - Redundant Clustered Distributed Filesystem, 
base filesystem for ASP data Server of Project Wolf. 

2. Overview 

Data server should handle data. Data should be stored and 
delivered to place of request. Typically for such purposes modern 
OS use concept of file system (FS). Implementation of any FS be 
a very complex task, and, it usually hardly depend upon a 
underlying OS. 
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But typical set of requirements to local FS running on particular 
computer do not include features like high availability in cluster 
or effective support of distributed operations. 

Proposed concept of Virtual Environments (VE) for each ASP end 
user requires that each instance of VE should have own set of 
files, including system ones. 

Technically all operations inside VE could be done by local FS. 
For example, all write operations could be done locally. 

Considering Linux as a platform for ASP we could mention, that 
all implementation of local FS do not have the following set of 
ASP platform requirements: 

• distributed access 

• hierarchical uniform naming space 

• high performance (fast access to any file in cluster) 

• scalability support of 100s computers in network 

• consistency journaling support and transactional 
features, including fast faults recovery 

• fault tolerance - any computer could be switched off 
without loosing of access to any data in file system; any 
network connection could disappear 

• self configurable - easy growth - any computer could be 
attached into network without user intervention into 
software (only some hardware connection should be 
done) 

• security - ACL type access or (at least) UNIX style 
grouping security; optional encryption 

Some of these features have direct mapping in local FS abilities, 
some not. For example, VE from inside do not need distributed 
features, fault tolerance is not a question of functionality of 
software to be run inside VE, high performance - not a real 
question of typical ASP user. Mentioned problems are problems 
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of ASP - as service provider, quality of his service, 
maintainability, etc. 

Otherwise, producing a new file system with all mentioned 
features could be very time- and resource- consuming task. 

Reasonable compromise could be a development of storage 
server as an add-on to local file system. 

Taking into account current features of Linux FS we could define 
the following set of data storage server requirements: 

• easy mapping to semantics of local FS on Linux - ASPdS 
should operate by the same concepts than local FS (at least, 
have superset of it implemented) 

• easy integration of it with local FS - localFS have to obtain a 
new features by calling some ASPdS services, and have to do 
it effectively 

• distributed access to files (any node in cluster could access 
it) 

• scalability - close-to-linear extensibility in amount of nodes; 
addition of new node to cluster should not influence to 
overall performance 

• controlled high availability of data - level of fault tolerance 
features should be controllable in range from local FS 
supported (usually nothing) up to transaction based 
(journalled) with all record operation immediately available 

• maintainability - easiness of installation and handling 
(minimization of TCO) 

• locking support for distributed access to file in global 
namespace 

Interesting, that we don't need a special locking feature for 
typical operation - each VE run under control of the single OS 
kernel on behalf of a given hardware unit, and support any 
locking from underlying OS. In case of distributed applications 
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they should utilize a couple off VEs on different nodes and use 
existing network-distributed SDK (MPI or PVM environment, for 
example). 

In such a implementation we use a local FS as a file cache for 
global data storage. 

Some applications (paraller database engines in cluster, for 
example) may want a high-availability distributed data storage. 
ASPdS with appropriate locking support and possible tuning of 
block allocation policy may provide such functionality. 

What possibilities could be trade for these features? Probably we 
could use some CPU power, some additional disk space and 
some network performance. The actual penalty needs to be 
determined (measured) lately. 



3. Design Principals 

Here I describe main principles of organization of ASP cluster: 

• Equality of all nodes: 

each node in cluster have the same control functions 

• Locality of all algorithms: 

each node have information only about some adjacent 
neighbors it interested in and never have any global tables 

• Reuse of existing file systems features: 

each file stored on a local filesystem, application accesses 
data via a well balanced and robust implementations of 
native linux filesystems 

• Regulated redundancy of data: 
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each file stored in form of pieces and amount of pieces could 
vary; each piece exist in the only exemplar in system (no 
traditional caching - we just cache additional amount of 
pieces) Pieces are stored on distributed nodes, providing 
high availability. Filesystems may be restored in case of 
absence of some pieces. 

• Versioned files; 

each file stored in a set of transaction data (treat it as 
versions of file); and each version could not be changed both 
in size and data (instead we just generate a new transaction 
with new file version). There will be a user-controlled tools to 
access a previous version of a given file, list version 
information and restore a state of a file to some specified 
version. 

• Timestamped versions: 

Transaction is marked with the current time and transaction 
ID - in order to provide and access to "snapshots" of 
filesystem for given time while maintaining unique id. 

• Uniformly distributed network loading: 

we could move data in parallel manner from network to 
particular node (depending upon a topology) and thus utilize 
a network channel with minimal current network loading 

• Migration of information to the place of it's active utilization: 
we could move pieces of file to nodes on which they are 
intensively used 

• Time should be synced on each nodes of cluster. 

4. Network Topology 

Suggested physical network topology is a single segment within 
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a switched environment. For performance reasons node may 
want to know about its place in switched topology and the 
throughput of its network interface. However there are no direct 
restriction of a physical topology, it may be also a LAN with an 
arbitrary topology or even an global distributed environment. The 
only required ability is a presence of uniform broadcast or 
multicast group within a cluster. 

Logically a node autoconfigure its place with the nearest 
neighbors to find active participants for redundant data storage. 
The filesystem daemons maintain permanently opened 
connection to node neighbors. 

In case of a neighbor failure a parent node initiate a failover 
switch to another neighbor. After the restored availability of a 
previous neighbor it initiates a request to its parent node. The 
parent node may response with a switchover connection to a 
restored node or continue to work with the current neighbor, 
depending on the actual amount of its data available on one or 
another node. 

During normal oparation nodes do not change its neighbors. The 
amount of neighbors used depends on required data redundancy 
(configuration parameter). The algorithm for automatic neighbors 
selection based on CPU/disk/network load needs been addressed 
lately. 

5. Files representation 

Each file consists of two parts: 

• locally stored data to provide fast access to the latest file 
version; 

• remotely stored data to provide high availability and 
versioning. 
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In case of unavailability of local representation (disk failure, 
node failure - VE restarted on different node, insufficient disk 
space, etc..) it will be retrieved from the remote representation. 

Local representation is just a native file representation on a local 
filesystem. Only some parts of the file may been stored locally 
(for performance/bandwidth/disk space reasons). Local 
representation reuses the property of the local filesystem (i.e. 
journalling, very large file support, etc.). The proposed local 
filesystem for the ASP cluster should definitely include 
journalling support. Currently, there are four candidates on the 
role: 



reiserfs 


http://devlinux.com/oroiects/reiserfs/ 


ext 


http://web.mit.edu/tytso/www/linux/ 


JFS 


http://oss.software.ibm.com/developerworks/opensoi 


XFS 


http://oss.sgi.com/projects/xfs/ 



Right now it is difficult to find out the best journalling FS for our 
needs, all the above filesystems are currently at the development 
and/or alpha testing stage. For today the reiserfs has been more 
thoroughly tested while IBM JVS has the most potentials due to 
it large history on AIX. Probably within the time constraints of 
the 2.4 linux kernel release (summer 2000) there will be more 
arguments for the choice. 

Remote representation consists of set of extents: fixed size 
blocks. Block size could be variable. Files are 
assembled/reassembled in user space daemons. Not necessary 
to assemble whole file to provide access to them - it could be 
only small part of them. 

Each block could have any number of versions. Versions are 
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numbered and sequenced by combination of timestamp and 
transaction ID number unique for each node (to avoid small 
resolution time and time sync problems). Main idea of such a 
numbering is a detection of last version of each file block to 
assemble. 

Generally, remote file representation consists of set of blocks 
grouped in transactions. 

6. Regulated redundancy and versioning 

> Each block stored in form: divided into N pieces, and any K 
pieces of them (K <= N) are enough to assemble initial block. 
Size of each block are minimized and equal to (size of block)/K 
with minimal overhead of some additional info in header. 
Appropriate mathematics are exists. 

Typically file divided into at least N = (K+M) pieces, where M - 
amount of the nodes which could disappear simultaneously. Each 
block is stored on a separate remote node. There are K+M 
individual nodes with block pieces and the removal of M of them 
still gives an ability to restore block. 

Maximal amount of possible pieces are restricted in the very 
initial stage of piece creation - and should be reasonable large 
(up to K*L where L - max nodes in cluster). 

Each block piece exists in the only instance - we do not use any 
copies; instead, we provide an additional piece. 

7. File access 

File open: 

• Check for actuality of a local file copy. If the local copy is up- 
to-date, use it. Such check will use a network request only 
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once-per-session of FS support daemons, so for typical 
system usage it will be a once per file from VE boot time. 

• If local copy not found: 

o translate logical name of file into physical ID of blocks 
(via directory data) 

o send a multicast request for blocks to network 

o after receiving number of replies that at least we could 
assemble one file extent release make a local copy of 
assembled data, and return open call with success. 

o send a block transfer packets (read-ahead) 

• Think of: 

o Cache of a negative (file not found) replies. 

Read access (the same for one or more blocks not necessary to 
assemble all file locally!): 

• check for availability of current block to read in local 
filesystem 

• if available read it and return 

• if not available send search request, wait for response, 
and then send a pieces transfer request. Immediately 
after receiving of appropriate amount of pieces cancel all 
other transfers and assemble block in local cache; read it 
and return 

After receiving a search request each node have to return block 
with the latest (timestamp + transaction_counter). 

Write requests depend upon a consistency requirements. 
Generally, we store file locally and queue disassemble request 
for remote storing. After the queuing transaction, we return 
success to a userspace from the write request. 

As copy-on-write files, deleted files need to be marked in local 
VE cache too. 
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For files opened in synchronous write mode (O.SYNC) we could 
wait for remote storing procedure to complete physical write on 
remote neighbors. However this kind of redundancy seems a 
little paranoid for me so an intermediate solution is suggested: 
the write call will wait for physical operation on local file write 
request and queue transaction to be completed. In such way we 
ensure the locally store data has been gone to disk, and remote 
storing requests has been joumalled locally to physical media. 

In general, we can provide all levels of synchronous write 
support: 

o on synchronous operation; 

o synchronous update of local FS transaction log 

(journalling local FS); 
o synchronous update of local FS (UNIX O.SYNC); 
o synchronous update of RCDFS remote store request; 
o syncrounous RCDFS remote store (wait for data to be 

sent to neigbors); 
o syncrounous RCDFS remote store, waiting for neigbors 

write to complete on physical media; 

The typical applications for each FS mode needs to be specified. 

8. Garbage collection 

Another significant question of such a versioning system is a 
garbage collection - removal of old block versions. Generally, we 
post all blocks to system only after commit operation for 
particular transaction. This means that if we want to remove any 
particular version of block we have to be sure that we could 
assemble next version of this block. The problem is that we have 
to be sure that ALL blocks participating in transaction available 
for assembling. Probably, purge request will be send by node 
posting transaction after successful placing of all pieces. 
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9. Pieces distribution 

Pieces distribution algorithm have to place it on appropriate 
amount of nodes (in agreement with fault tolerance 
requirements), and provide some migration mechanism to 
minimize a network loading (just move each piece to node on 
which it will be used maximally intensively). 

10. VE support 

Each VE within a local node should receive its own separate 
filesystem namespace begin s from root. For the purpose of disk 
space optimization, we will provide a "copy-on-write" mechanism 
within a local node. 

From global node namespace (not within VE) the given filesystem 
subtree is assigned for VE support. Within it, each VE receives its 
own subtree for the VE root namespace. Also, some another 
subtree has been assigned a status of "VE template". 

For file read access, the VE personal subtree (/.ve/VE-ID/...) has 
been checked first. If file exists, it is accessed from here. If file 
here does not exists, VE template subtree has been checked. On 
success the special kind of link has been created within VE 
personal subtree. This link copies all file attributes while for the 
file content application is been redirected to VE template disk 
space. 

For file read-write (modify) access, we extend this algorithm with 
copy on write policy: if application gets a template file and 
modifies its contents, the file content is actually copied into 
personal VE subtree replacing the special kind link. 

File creation always occurred within personal VE subtree. 
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' ~~ (global namespace root) 

— usr 

I-- bin 
I— lib 

I — .ve-template 

/— , . (template filesystem layoi 

I -- bin 

|— .ve ^ ^ (mount point of VE suppor 

I /— (root of ve with id=1) 
I— bin 

-- 21 

/-- (root of VEwith id=21) 
I— bin 

1 1 . Performance estimation 
To be calculated. 

12. Portability 

In general, most VE implementations are kernel-dependent and 
could not be easily ported to another UNIX. But used open and 
highly modular approach could significantly simplify such a task. 

The implementation of the cluster part of this project are highly 
portable and allow utilization not only different dialects of Unix 
on different hardware platforms, but also such platforms like 
Windows 2000. 

13. Implementation Strategy 

Like any network or distributed filesystems, RCDFS consists of a 
kernel-level FS modules and a set of user mode processes 
(daemons). The daemons run outside VE in the usermode 
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namespace of each node. On a later project stage, these 
daemons may been rewritten as a kernel-mode threads for 
performance reason. 

Currently, the need for some daemons come mind: 

• ds-lookupd 

o node autoconfiguration 

o name lookup in the global cluster namespace 

• ds-cored 

o handle RCDFS request for remote read and write requests 

o transaction id generation 

o handle read/write/delete block requests 

• ds-supportd 

o collect usage statistic 
o garbage collection 

For broadcast traffic, IP multicasting may be used for easy 
access to an IP network with complex topology. 

14. Things to be addressed 

• Name lookup algorithm: method for caching negative (file not 
found replies) locally. 

• Disk / network bandwidth / cpu usage optimization criteria for 
choosing neighbors. 

• Interaction 7 " with a linux buffer-cache: is it a task for our 
module or we just make a local file copy available and then 
redirect request to a local filesystem driver. 

• Files stored locally only partial: method of redirecting local 
read request to remote transaction. 

• Access to previous file versions: define semantics and user 
tools. 

• Disk quotas. 

• Implementation details of garbage collection. 
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• Global namespace support (cluster-visible). 

• Distributed locking support. 



Document Owner: Yuri Pudgorodsky, ASPLinux yur@sw.mipt.ru 
Base Document: Project Wolf, Techical Vision from Alex 
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head 1.13; 

access; 

symbols 

aspf s_0_9_l_dev : 1 . 13 
aspf s_0__9_dev : 1 . 12 ; 

locks; strict; 

comment @ * @; 



1.13 

date 2002.10.21.20.07.14; 
branches ; 
next 1.12; 

1.12 

date 2 001.06.06.09.2 9.18; 
branches ; 
next 1.11; 

1.11 

date 2001. 05 . 14 . 13 .48 .45; 
branches ; 
next 1.10; 

1.10 

date 2 001 . 05 . 02 . 12 . 12 . 18 ; 
branches ; 
next 1.9; 

1.9 

date 2 001.04.13.09.23.57; 

branches; 

next 1.8; 

1.8 

date 2 001.04 .12 . 09. 51.01 ; 

branches; 
next 1.7; 

1.7 

date 2 001 . 04 . 10 . 14 .55 .41 ; 
branches ; 
next 1.6; 

1.6 

date 2 001.03.31.14.24.19; 

branches; 

next 1.5; 

1.5 

date 2 001.03 .3 0. 10. 3 1.07 ; 

branches; 

next 1.4; 

1.4 

date 2001. 03 .22. 09. 14.13 ; 

branches; 

next 1.3; 

date 2 001.03 . 12 .11. 18.26 ; 
branches ; 
next 1.2; 



author dev; state Exp; 



author media; state Exp; 



author media; state Exp; 



author dev; state Exp; 



author media; state Exp; 



author media; state Exp; 



author media; state Exp; 



author media; state Exp; 



author media; state Exp; 



author media; state Exp; 



author media; state Exp; 



1.2 

date 2001.02.26.12.15.11; author media; state Exp; 

branches ; 
next 1.1; 

1.1 

date 2 001.02.15.14.04.50; author media; state Exp; 

branches; 

next 



desc 

@@ 



1.13 
log 

©Added WorkDir 

@ 

text 
@/* 

* Copyright (C) SWsoft 2000-2001 
* 

* Author: Ruslan Iljin <media@@www. rt .mipt . ru> 
* 

* $Id: config.h,v 1.12 2001/06/06 09:29:18 media Exp $ 
* 

* Description: headers for config file processing 
*/ 

#ifndef TOPD_CONFIG_H_ 

#define TOPD_CONFIG_H 

#ifdef cplusplus 

extern "C" { 
#endif 

#ifdef DEFINE_CONFIG_VARIABLES 

#define DECLARE 

#define DEFAULT (x) =x; 
#else 

ttdefine DEFAULT (x) 

#ifdef cplusplus 

#define DECLARE extern "C" 

#else 

#define DECLARE extern 
#endif 
#endif 

int parse_config (const char* conf_name, int looketc) ; 

DECLARE void add__neighbour_from_conf ig (char *neigh) ; 

DECLARE char* BindToAddress DEFAULT ("0.0.0.0"); 
DECLARE char* WorkDir DEFAULT (" -/storage" ) ; 

DECLARE char* CoordAddress DEFAULT (NULL) ; 

DECLARE int Daemon DEFAULT (1); 

/* timeouts */ 

DECLARE int TOPD TIMEOUT RECONNECT DEFAULT (3 0); 



DECLARE int TOPD TIMEOUT STAGE 0 



DEFAULT (5) ; 



DECLARE int 
DECLARE int 

DECLARE int 
DECLARE int 

DECLARE int 
DECLARE int 

DECLARE int 

DECLARE int 

DECLARE int 



TOPD_TIMEOUT_WRITE_STAGE_l DEFAULT (2 0) ; 
TOPD jriMEOUT_WRITE_STAGE_2 DEFAULT (20); 

TOPD_TIMEOUT_READ_STAGE_l DEFAULT (60); 
TO PD_T I ME OUT_READ_S T AGE_2 DEFAULT (60) ; 

TOPD_TIMEOUT_SNAP_STAGE_l DEFAULT (15); 
TOPD_TIMEOUT_SNAP_STAGE_2 DEFAULT (20); 

TO PD_T IMEOUT_D I RRE C_S T AGE_1 DEFAULT (15) ; 

TOPD_TIMEOUT_BLOCK0_STAGE_l DEFAULT (15) ; 

TOPD TIMEOUT PROXY DEFAULT ( 10) ; 



#ifdef 
} 

#endif 



_cplusplus 

// extern "C" 



#undef _TOPD_JREAD_ALL_SLICES_ /* read all available slices? if undef read 
only required number of slices */ 

#undef _TOPD__REMOVE_AFTER_READ_ /* remove snap entry after reading? */ 
#undef _TOPD_LOOP_PING_ /* continue pinging neighbours for dynamic 

table? */ 

#endif 



1.12 
log 

©*** empty log message *** 

@ 

text 
@d6 1 
a6 1 

* $Id: config.h,v 1.11 2001/05/14 13:48:45 media Exp $ 
d36 1 

@ 



1.11 
log 

©fixed many bugs in lists 

@ 

text 
@d6 1 
a6 1 

* $Id: config.h,v 1.10 2001/05/02 12:12:18 dev Exp $ 
d56 2 

@ 



1. 10 
log 

©Global code review. 
TFS_assert -> ASSERT 

removed all tfsLog functions, debug () is used instead 

SWSoft -> SWsoft :) 

© 

text 
@d6 1 



a6 1 

* $ld: config.h,v 1.9 2001/04/13 09:23:57 media Exp $ 
d33 2 
d37 1 
a37 1 

DECLARE char* CoordAddress DEFAULT ("0.0.0.0") ; 

@ 



1.9 
log 

©remove trailing spaces 
@ 

text 
@d2 1 
a2 1 

* Copyright (C) SWSoft 2000-2001 
d4 3 

a6 1 

* $Id: config.h # v 1.8 2001/04/12 09:51:01 media Exp $ 
a8 2 

* Author: Ruslan Iljin <mediaO@www.rt.mipt.ru> 

@ 



1.8 
log 

©header files description 

© 

text 
@d4 1 
a4 1 

* $Id: config.h,v 1.41 2001/04/12 08:26:34 media Exp $ 
dll 1 
all 1 

© 



1.7 
log 

©code review 



text 
@dl 11 
d39 1 
a39 1 

/* timeouts 
d44 2 
a45 2 

DECLARE int 
DECLARE int 
d47 2 
a48 2 

DECLARE int 
DECLARE int 
d50 2 
a51 2 

DECLARE int 
DECLARE int 
d53 1 
a53 1 

DECLARE int 



T0PD_TIME0UT_WRITE_STAGE_1 DEFAULT (20) ; 
TOPD TIMEOUT_WRITE_STAGE_2 DEFAULT (20); 



T0PD_TIME0UT_READ_STAGE_1 DEFAULT (60) ; 
T0PD~TIME0UT_READ_STAGE_2 DEFAULT (60); 



TOPD JTIMEOUT_SNAP_STAGE_l DEFAULT (15); 
TOPD_TIMEOUT_SNAP_STAGE_2 DEFAULT (20); 



TOPD TIMEOUT DIRREC STAGE 1 DEFAULT (15); 



© 



1.6 

log 

©changes in dyn tables 

@ 

text 
@d28 17 
a44 1 

DECLARE int TO PD_TIMEOUT_RE CONNECT DEFAULT (30); 

@ 



1.5 

log 

©changes for new analize functions 

© 

text 
@d34 3 

a36 3 , 

#define _TOPD_READ_ALL_SLICES_ 0 /* read all available slices? if 0 read 

only required number of slices */ 

#define _TOPD_REMOVE_AFTER_READ_ 0 /* remove snap entry after reading? */ 
#define _TOPD__LOOP_PING_ 0 /* continue pinging neighbours for 

dynamic table? */ 
© 



1.4 
log 

©*** empty log message *** 

@ 

text 
@dl 2 
a2 2 

#ifndef C0NFIG_H 

#define CONFIG_H 

d33 4 

© 



1.3 
log 

©*** empty log message *** 

@ 

text 
@d2 8 2 

© 



1.2 
log 

©*** empty log message *** 

© 

text 
@d24 2 

© 



1.1 
log 

©*** empty log message *** 

© 



text 






@a22 12 






DECLARE 


char* 


TopdAcidress DEFAULT ("127.0.0.1"); 


DECLARE 


char* 


WorkDir DEFAULT ( " -/ clroot / " ) ; 


DECLARE 


char* 


TmpDir DEFAULT ( " / tmp/ " ) ; 


DECLARE 


int 


PutFileMaxBloxPerStep DEFAULT (100) ; 


DECLARE 


int 


GetFileMaxBloxPerStep DEFAULT (100) ; 


DECLARE 


int 


TimeOutPutOBlock DEFAULT (5) ; 


DECLARE 


int 


TimeOutGe tSnap DEFAULT ( 5 ) ; 



DECLARE int 

@ 



TimeOutTopdReconnect DEFAULT ( 5 ) ; 



head 1.34; 

access; 

symbols 

aspf s_0_9_l_dev : 1 . 34 
aspf s_0__ 9_dev : 1 . 34 ; 

locks; strict; 

comment @// @; 



1.34 

date 2001.06.19.13.15.53; author media; state Exp; 

branches; 

next 1.33; 

1.33 

date 2001.06.06.09.29.18; author media; state Exp; 

branches ; 
next 1.32; 

1.32 

date 2001.06.01.12.22.23; author media; state Exp; 

branches ; 
next 1.31; 

1.31 

date 2001.06.01.10.22.34; author media; state Exp; 

branches ; 
next 1.30; 

1.30 

date 2001.05.02.12.12.18; author dev; state Exp; 

branches; 

next 1.29; 

1.29 

date 2001.04.25.16.18.56; author media; state Exp; 

branches; 

next 1.28; 

1.28 

date 2001.04.20.10.46.27; author media; state Exp; 

branches; 

next 1.27; 

1.27 

date 2001.04.20.10.26.04; author media; state Exp; 

branches ; 
next 1.26; 

1.26 

date 2001.04.20.09.23.00; author media; state Exp; 

branches; 

next 1.25; 

1.25 

date 2001.04.19.11,10.46; author media; state Exp; 

branches ; 
next 1.24; 

1.24 

date 2001.04.19.08.22.10; author media; state Exp; 

branches; 

next 1.23; 



1.23 

date 2001.04.13.09.23.57; author media; state Exp; 

branches; 

next 1.22; 

1.22 

date 2001.04.12.11.37.14; author media; state Exp; 

branches; 

next 1.21; 



1.21 

date 2001.04.12.11.3 5.27; author media; state Exp; 

branches; 

next 1.20; 



1.20 

date 2001.04.12.11.33.26; author media; state Exp; 

branches ; 
next 1.19; 



1.19 

date 2001.04.12.10.42.10; author media; state Exp; 

branches; 

next 1.18; 



1. 18 

date 2001.04.12.10.34.20; author media; state Exp; 

branches ; 
next 1.17; 



1.17 

date 2001.04.12.10.26.35; author media; state Exp; 

branches ; 
next 1.16; 

1.16 

date 2001.04.04.10.33.33; author media; state Exp; 

branches; 

next 1.15; 



1.15 

date 2001.03.30.12.43.36; author media; state Exp; 

branches; 

next 1.14; 



1.14 

date 2001.03.23.15.29.23; author sinw; state Exp; 

branches; 

next 1.13; 



1.13 

date 2001.02.23.11.15.18; author media; state Exp; 

branches; 

next 1.12; 



1.12 

date 2001.02.22.14.52.34; author media; state Exp; 

branches; 

next l.il; 



1.11 

date 2001.01.30.17.48.43; author media; state Exp; 

branches ; 
next 1.10; 



1.10 

date 2 001.01.3 0.17.29.44; 
branches ; 
next 1.9; 



author media; state Exp; 




.11.43.19; author media; state Exp; 



.09.38.18; author media; state Exp; 



.14.4 0.12; author media; state Exp; 



1.6 

date VHHHfll- 15. 11.41; author media; state Exp; 

branches ; 
next 1.5; 

1.5 

date HMA.14.20.48; author media; state Exp; 

branches ; 
next 1.4; 

1-4 

date WMMBHflfll-13.01.il; author media; state Exp; 

branches ; 
next 1.3; 



1.3 

date flHflflfl.13.18.39; author media; state Exp; 
branches ; 
next 1.2; 



1.2 

date flflflflHBflfl. 08.57.38; author media; state Exp; 

branches ; 
next 1.1; 



1.1 

date flIMflfli . 13 .21 .27; author media; state Exp; 

branches; 

next ; 



desc 

@@ 



1.34 
log 

©*** empty log message *** 

@ 

text 
©/* 

* Copyright (C) SWsoft 2000-2001 



* Author: Ruslan Iljin <mediaO@www.rt.mipt.ru> 

* $Id: fserv.cc,v 1.33 2001/06/06 09:29:18 media Exp $ 
* 

* Description: this file contains interface to file server 
* 



tfsTopd: :fserv_send 
topd . h 

send packet to client channel 

p - packet to send 

int - send count 

TFS E TOPD FSERV WRONG 



#include "topd.h" 
/* 

* function 

* prototype in; 

* description 

* parameters 

* return 

* errors 

*/ ■ *• 

int tf sTopd: : f serv_send (tf sPKT *p) 

{ 

int count=0; 

INFUNC(tfsTor4i: :fserv_send) ; 
if (!CjEserv) { 

PRINTD2 ( "f ileserver_send: there is no fserv") ; 

OUTFUNCINT(TFS_EJTOPD_FSERV_WRONG) ; 

} 

if ( (C_f serv->state == tfsChannel :: CONNECTED) | | (C_fserv->state == 
tf sChannel : : OPENED) ) 

{ 

PRINTD2 ("Packet sent to fserv"); 
C_f serv->Pu*iQ(p) ; 
count ++; 

} 

OUTFUNCINT (count) ; 



} 

/* 

* 



function 
prototype in 
description 
parameters 



return 
errors 



f s e rv_common^r eque s t 
topd.h 

send snap, read, dirrec, write request to fserv 

reqid - RequestID of snap request 

buff - address of data buffer 

length - length of data buffer 

req_type - request type 

int - result of fserv_send 

TFS E TOPD NO MEMORY 



int tf sTopd: :fserv_common_request (tf sRequestID reqid, char* buff, int length, 
int req_type) 

{ 

int rc ; 

tfsPktType p_type[7]={ 
TFS_MSG_T2 F_SNAP_REQ , 
TFS_MSG_T2 F_SLICE_READ_REQ , 
TF S_MS G_T 2 F_D I RRE CORD__L I S T_RE Q , 
TFS_MSG_T2F_BLOCK0_LIST_REQ, 
TFS_MSG_T2F_SLICE_WRITE_REQ, 
TFS_MSG_T2 F_ZEROBLOCK_WRITE_REQ , 
TFS_MSG_T2F_DIRREC0RD_WRITE_REQ} ; 



INFUNC (tf sTopd : : f serv_common_re quest ) ; 



TO PD_SNAP_REQUE S T ) PRINTD2 ( "send snap request to fserv"); 
TO PD_RE AD_RE QUE S T ) PRINTD2 ( " send read request to fserv") ; 
TOPD_DIRREC_REQUEST) PRINTD2 ( "send directory record list request 



TOPDJSLICEJWRITE) PRINTD2 ( "send slice write request to fserv"); 
TOPD_ZEROBLOCK_WRITE) PRINTD2 ( " send zeroblock write request to 



if (req_type= 
if (req_type= 
if (req_type= 
to fserv" ) ; 

if ( r eq_type = =TOPD_BLOCK0_REQUEST ) PRINTD2 ( "send blockO list request to 
fserv") ; 

if (req_type 

if (req__type 
fserv") ; 

if (req_type==TOPD_DIRREC_WRITE) PRINTD2 ( " send directory record write request 
to fserv") ; 



tfsPKT *p = new tfsPKT (p_type [req_type] , length, buff , reqid) ; 
if (!p) { 

PRINTD ( "ERROR : not enough memory"); 
OUTFUNCINT ( TFS_E_TOPD_NO_MEMORY ) ; 

} 

rc=f serv_send (p) ; 

if (TFS_E_CHECK(rc) ) rc=0; 

OUTFUNCINT (rc) ; 

} 

/* 

* function 

* prototype in 

* description 

* parameters 

* return 

* errors 
*/ 

int tfsTopd: :fserv_search_reply( tfsPKT *p) 

{ 



f se rv_s ear ch_reply 
topd . h 

processing snap reply from fserv 
p - packet with reply 
int - result of initiate function 
apropriate to initiate function 



int rc ; 



INFUNC (tf sTopd: : f serv_search_reply) ; 
PRINTD2("got slice search reply from fserv"); 

if (header_get_hops_number (p->data, p->length) ) rc=initiate_search_reply (p, 
TOPD_SLICE_SEARCH_REPLY) ; 

else rc=search_reply (p, get_neighbour_number (myself ) , 
TO PD_S L I CE_S E ARCH_RE PLY ) ; 

OUTFUNCINT (rc) ; 

} 



/* 

* function 

* prototype in 

* description 

* parameters 

* return 

* errors 



f s erv_read_r eply 
topd . h 

processing read reply from fserv 
p - packet with reply 
int - result of initiate function 
apropriate to initiate function 



7 



int tfsTopd: : f serv_read_reply (tfsPKT *p) 
{ 

int rc; 

INFUNC (tfsTopd: : f serv_read_reply ) ; 
PRINTD2("got slice read reply from fserv") ; 

if (header_get_hops_number (p->data, p->length) ) rc=initiate_search_reply (p, 
TOPD_SLICE_READ_REPIiY) ; 

else rc=search_reply (p, get_neighbour_number (myself ) , TOPD_SLICE_READ_REPLY) ; 
OUTFUNCINT (rc) ; 



} 



/* 

* function 

* prototype in 

* description 

* parameters 

* return 

* errors 
*/ 

int tf sTopd: :fserv_dirrecord_reply (tf sPKT *p) 

{ 

int rc ; 



f serv_dirrecord_reply 
topd . h 

processing dirrecord reply from fserv 
p - packet with reply 
int - result of initiate function 
apropriate to initiate function 



INFUNC (tf sTopd : : f serv_dirrecord_reply ) ; 
PRINTD2 ( "got dirrec reply from fserv"); 

if (header_get_hops_number (p->data, p->length) ) rc=initiate_search_reply (p, 
TOPD_D I RRE C_S E ARCH_RE PLY ) ; 

else rc=search_reply (p, get_neighbour_number (myself ) , 
TOPD_D I RREC__S E ARCH_RE PLY ) ; 

OUTFUNCINT(rc) ; 

} 



function 

prototype in 

description 

parameters 

return 

errors 



*/ 

int tfsTopd: 

{ 

int rc; 



f serv_blockO_reply 
topd . h 

processing blockO reply from fserv 
p - packet with reply 
int - result of initiate function 
apropriate to initiate function 



:f serv_blockO_reply (tfsPKT *p) 



INFUNC (tf sTopd: : f serv_blockO_reply) ; 
PRINTD2("got blockO reply from fserv"); 

if (header_get_hops_number (p->data, p->length) ) rc=initiate_search_reply (p, 
TOPD_BLOCK0_SEARCH_REPLY) ; 

else rc=search_reply (p 7 get_neighbour_number (myself ) , 
TOPD_BLOCK0_SEARCH_REPLY) ; 

OUTFUNCINT(rc) ; 

} 

@ 



1.33 
log 

©*** empty log message *** 
© 

text 
@d6 1 
a6 1 

* $Id: fserv. cc,v 1.32 2001/06/01 12:22:23 media Exp $ 
al61 1 

@ 



1.32 
log 

@*** empty log message *** 
@ 

text 



@d6 1 
a6 1 

* $Id: fserv.cc,v 1.31 2001/06/01 10:22:34 media Exp $ 
d56 1 

a56 1 

tfsPktType p_type[6]={ 
d60 1 
d70 1 
d91 1 
a91 3 

* parameters : 



dllO 1 
alio 3 
* parameters 



dl29 1 
al29 3 
* parameters 



reqid - RequestID of snap request 
buff - address of data buffer 
length - length of data buffer 



reqid - RequestID of snap request 
buff - address of data buffer 
length - length of data buffer 



reqid - RequestID of snap request 
buff - address of data buffer 
length - length of data buffer 



dl41 19 



1.31 
log 

@*** empty log message *** 

@ 

text 
@d6 1 
a6 1 

* $Id: fserv.ccv 1.30 2001/05/02 12:12:18 dev Exp $ 
d95 1 
a95 1 

int tf sTopd : : f serv_search_reply (tf sRequestID reqid, char* buff, int length) 
dlOl 2 
al02 2 

if (header_get__hops_number (buf f , length) ) rc=initiate_search_reply (reqid, 
buff, length, TOPD_SLICE_SEARCH_REPLY) ; 

else rc=search_reply (reqid, get_neighbour__ number (myself ) , buff, length, 
TOPD_SLICE_SEARCH_REPLY) ; 
d!16 1 
all6 1 

int tfsTopd: :f serv_read_reply (tf sRequestID reqid, char* buff, int length) 
dl22 2 
a!23 2 

if (header_get_hops_number (buf f , length)) rc=initiate_search_reply (reqid, 
buff, length, TO PD_S L I CE_READ__RE PLY ) ; 

else rc=search_reply (reqid, get_neighbour_ number (myself ) , buff, length, 
TO PD_S L I CE_RE AD_RE PL Y ) ; 
dl37 1 
al37 1 

int tfsTopd :: f serv_dirrecord_reply (tf sRequestID reqid, char* buff, int length) 
dl43 2 
al44 2 

if (header_get_hops_number (buf f , length) ) rc=initiate_search_reply (reqid, 
buff, length, TOPD_DIRREC_SEARCH_REPLY) ; 

else rc=search_reply (reqid, get_neighbour_number (myself ) , buff, length, 
T0PD_D I RREC_S EARCH_RE PLY ) ; 

@ 



1.30 
log 

©Global code review. 
TFS_assert -> ASSERT 

removed all tfsLog functions, debug () is used instead 

SWSoft -> SWsoft :) 

@ 

text 
@d6 1 
a6 1 

* $Id: fserv.cc,v 1.29 2001/04/25 16:18:56 media Exp $ 
d79 2 
@ 



1.29 
log 

@*** empty log message *** 

@ 

text 
@d2 1 
a2 1 

* Copyright (C) SWSoft 2000-2001 
d4 3 

a6 1 

* $Id: fserv.ccv 1.28 2001/04/20 10:46:27 media Exp $ 
a8 2 

* 

* Author: Ruslan Iljin <media@@www. rt .mipt .ru> 

@ 



1.28 
log 

©join read request 
© 

text 
@d4 1 
a4 1 

* $Id: fserv.cc,v 1.27 2001/04/20 10:26:04 media Exp $ 
dlOO 1 
alOO 1 

else rc=search_reply (reqid, get_neighbour_number (myself ) , buff, length, 

TOPD_DIRREC_SEARCH_REPLY) ; 
© 



1.27 
log 

©final search request imple, entation 

© 

text 
@d4 1 
a4 1 

* $Id: fserv.cc,v 1.26 2001/04/20 09:23:00 media Exp $ 
d!20 2 
al21 2 

if (header_get_hops_number (buf f , length)) 
rc=initiate__slice_transfer_reply (reqid, buff, length) ; 

else rc=slice_transf erjreply (reqid, get_neighbour_number (myself ) , buff, 
length) ; 

© 



1.26 
log 

©join search proc 
© 

text 
@d4 1 
a4 1 

* $Id: fserv.ccv 1.25 2001/04/19 11:10:46 media Exp $ 
dlOO 1 
alOO 1 

else rc=slice_search_reply (reqid, get_neighbour_number (myself ) , buff, length); 
dl42 1 
al42 1 

else rc=dirrecord_reply (reqid, get_neighbour_number (myself ) , buff, length); 

@ 



1.25 
log 

©add TOPD prefix to const 

© 

text 
@d4 1 
a4 1 

* $Id: fserv.ccv 1.24 2001/04/19 08:22:10 media Exp $ 
d99 1 
a99 1 

if (header_get_hops_number (buf f , length) ) 
rc=initiate_slice_search_reply (reqid, buff, length) ; 
dl4l 1 
al41 1 

if (header_get_hops_number (buf f , length)) rc=initiate_dirrecord_reply (reqid, 
buff, length) ; 
@ 



1.24 
log 

©remove neighbour address attachment 

@ 

text 
@d4 1 
a4 1 

* $Id: fserv.ccv 1.23 2001/04/13 09:23:57 media Exp $ 
d66 6 
a71 6 

if ( req_type==SNAP_REQUEST) PRINTD2 ( " send snap request to fserv"); 
if (req_type==READ_REQUEST) PRINTD2 ( " send read request to fserv"); 
if (req_type==DIRREC_REQUEST) PRINTD2 ( "send directory record list request to 
fserv") ; 

if (req__type==SLICE_WRITE) PRINTD2 ( "send slice write request to fserv"); 
if ( req_t ype = = ZEROBLOCK_WRITE ) PRINTD2 ( "send zeroblock write request to 
fserv") ; 

if (req_type==DIRREC_WRITE) PRINTD2 ( "send directory record write request to 
fserv" ) ; 
@ 



1.23 
log 

©remove trailing spaces 
© 

text 



@d4 1 
a4 1 

* $Id: fserv.ccv 1.22 2001/04/12 11:37:14 media Exp $ 
dlOO 1 
alOO 1 

else rc=slice_search_reply(reqid, myself ->Address, buff, length); 
dl21 1 
al21 1 

else rc=slice_transfer_reply (reqid, myself ->Address, buff, length); 
dl42 l 
al42 1 

else rc=dirrecord_reply (reqid, myself ->Address , buff, length); 

@ 



1.22 
log 

©change no memory debug level 

@ 

text 
@d4 1 
a4 1 

* Sid: fserv.ccv 1.21 2001/04/12 11:35:27 media Exp $ 
d77 1 
a77 1 

} 

@ 



1.21 
log 

©no memory check 
© 

text 
@d4 1 
a4 1 

* $Id: fserv.ccv 1.20 2001/04/12 11:33:26 media Exp $ 
d75 1 
a75 1 

PRINTD1 ( "ERROR: not enough memory"); 

@ 



1.20 
log 

@*** empty log message *** 

@ 

text 
@d4 1 
a4 1 

* $Id: fserv.ccv 1.19 2001/04/12 10:42:10 media Exp $ 
d74 4 

© 



1.19 

log 

©fserv desription 
© 

text 
@d4 1 
a4 1 

* $Id: fserv. cc,v 1.18 2001/04/12 10:34:20 media Exp $ 
d64 1 



a64 1 

INFUNC (tf sTopd: : f serv_write_re quest ) ; 

@ 



1.18 
log 

©*** empty log message *** 

@ 

text 
@d4 1 
a4 1 

* $Id: fserv.cc,v 1.17 2001/04/12 10:26:35 media Exp $ 
d41 11 
d55 1 
d65 1 
d72 1 
d75 1 
d79 10 
dlOO 31 
al41 10 

int tf sTopd: : f serv_read_reply (tf sRequestID reqid, char* buff, int length) 

{ _ 

int rc; 

INFUNC (tfsTopd: : f serv_read_reply) ; 
PRINTD2("got slice read reply from fserv" ); 
if (header_get_hops_number (buf f , length) ) 
rc=initiate_slice_transf er_reply (reqid, buff, length) ; 

else rc=slice_transf er_reply (reqid, myself ->Address , buff, length) ; 
OUTFXJNCINT(rc) ; 

} 

@ 



1.17 
log 

©header files description 

@ 

text 
@d4 1 
a4 1 

* $Id: fserv. cc,v 1.16 2001/04/04 10:33:33 media Exp $ 
d55 1 
a55 1 

if (req_type==DIRREC_REQUST) PRINTD2 ("send directory record list request to 
fserv") ; 
@ 



1.16 
log 

©code review 
© 

text 
@d4 1 
a4 1 

* Author: Ruslan Iljin <media@@www. rt .mipt . ru> 
d6 1 

a6 1 

* $Id: fserv. cc,v 1.15 2001/03/30 12:43:36 media Exp $ 
d8 1 

a8 6 
*/ 



/* 

* fserv.cc 
* 

* This file contains interface to file server 
dl4 8 

d28 1 
a28 1 

PRINTD2 ("fileserver_send: there is no fserv\n n ); 
d34 1 
a34 1 

PRINTD2 ("Packet sent to fserv\n") ; 
d41 1 
a41 1 

int tf sTopd: : fserv_write_re quest (tf sRequestID reqid, char* buff, int length, int 
req_type) 
d44 7 
a50 1 

tf sPktType p_type [3] ={TFS_MSG_T2F_ZER0BL0CK_WRITE_REQ, 
TFS_MSG_T2F_SLICE_WRITE_REQ, TFS_MSG_T2F_DIRREC0RD_WRITE_REQ} ; 
d53 6 
a58 3 

if (req_type==SLICE_WRITE) PRINTD2 ( " send slice write request to fserv\n n ); 
if (req_type==ZEROBLOCK_WRITE) PRINTD2 ( "send zeroblock write request to 
f serv\n" ) ; 

if (req_type— DIRREC_WRITE) PRINTD2 ( "send directory record write request to 
f serv\n" ) ; 
a63 22 

int tfsTopd: :fserv_search_request (tf sRequestID reqid, char* buff, int length) 
int rc; 

INFUNC (tf sTopd: : f serv_search_request ) ; 

PRINTD2 ("send slice search request to fserv\n") ; 

tfsPKT *p = new tfsPKT(TFS_MSG__T2F_SNAP__REQ, length, buff , reqid) ; 
rc=f serv_send (p) ; 
OUTFUNCINT(rc) ; 

} 

int tfsTopd: :f serv_dirrecord_request (tf sRequestID reqid, char* buff, int length) 
int rc; 

INFUNC (tfsTopd: : f serv_dirrecord_request ) ; 
PRINTD2 ("send dirrec request to fserv\n") ; 

tfsPKT *p = new tfsPKT (TFS_MSG_T2F_DIRREC0RD_LIST_REQ, length, buff , reqid) ; 
rc=f serv_send (p) ; 
OUTFUNCINT(rc) ; 

} 

6.69 1 
a69 1 

PRINTD2("got slice search reply from fserv\n") ; 
d80 1 
a80 1 

PRlNTD2("got dirrec reply from fserv\n") ; 
a85 11 

int tfsTopd: :f serv_read_request (tf sRequestID reqid, char* buff, int lenqth) 
{ 

int rc; 

INFUNC (tfsTopd: : f serv_read_re quest ) ; 

PRINTD2 ("send slice read request to fserv\n"),- 

tfsPKT *p = new tfsPKT (TFS_MSG_T2F_SLICE_READ_REQ, length, buff , reqid) ; 



rc=f serv_send(p) ; 
OUTFUNCINT(rc) ; 

} 

d91 l 
a91 1 

PRINTD2("got slice read reply from fserv\n"); 

@ 



1. 15 
log 

©removed tabs spaces 
@ 

text 
@d4 1 
a4 1 

* Author: Ruslan Iljin <media@@www. rt .mipt . ru> 
d6 1 

a6 1 

* $Id: fserv.cc,v 1.14 2001/03/23 15:29:23 sinw Exp $ 
dlO 2 

all 2 
/* 

* fserv.cc 
d23 1 

d26 1 
a26 1 

return -1; 
d29 2 
a30 2 

if ( (C_fserv->state == tfsChannel: : CONNECTED ) | | (C_f serv->state == 
tf sChannel : : OPENED) ) 

{ 

d34 2 
a35 2 
} 

return count; 
d40 1 
d42 2 
d47 3 
a49 2 

tfsPKT *p = new tf sPKT (p_ type [req_type] , length, buff, reqid) ; 

f serv_send (p) ; 
d51 1 
a51 1 

d54 7 
a60 3 

PRINTD2 ("send slice search request to fserv\n") ; 

tfsPKT *p = new tf sPKT (TFS_MSG_T2F_SNAP_REQ, length , buff , reqid) ; 

f serv_send (p) ; 
d65 7 
a71 3 

PRINTD2 ("send dirrec request to fserv\n") ; 

tfsPKT *p = new tfsPKT (TFS_MSG_T2F_DIRRECORD_LIST_REQ, length, buff , reqid) ; 

f serv_send (p) ; 
d76 7 
a82 3 

PRINTD2 ( "got slice search reply from fserv\n" ); 

if (header_get_hops_number (buf f , length) ) initiate_slice_search_reply (reqid, 
buff, length) ; 

else slice_search_reply (reqid, myself ->Address, buff, length); 
d87 7 



a93 3 

PRINTD2 ("got dirrec reply from fserv\n") ; 

if (header__get_hops_number (buf f , length) ) initiate_dirrecord_reply (reqid, 
buff, length) ; 

else dirrecord_reply (reqid, myself ->Address, buff, length); 
d98 7 
al04 3 

PRINTD2 ( "send slice read request to fserv\n") ; 

tfsPKT *p = new tf sPKT (TFS_MSG_T2F_SLICE_READ_REQ, length, buff , reqid) ; 

f serv__send (p) ; 
d!09 7 
all5 3 

PRINTD2 ( "got slice read reply from fserv\n"); 

if (header_get_hops_number (buff , length)) initiate slice transfer reply(r 
buff, length); 

else slice__transfer_reply (reqid, myself ->Address , buff, length); 

@ 



1.14 
log 

©Change copyright sinw@@ 

@ 

text 
@d6 1 
a6 1 

* $Id: /home/cvs/aspfs/topd/fserv.cc,v 1.13 2001/02/23 11:15:18 media Exp $ 
d24 2 
a25 2 

PRINTD2 ("fileserver__send: there is no fserv\n") ; 
return -1; 

d29 5 
a33 5 

{ 

PRINTD2 ( "Packet sent to fserv\n") ; 
C_fserv->PutQ(p) ; 
count ++; 

} 

@ 



1.13 
log 

@*** empty log message *** 

@ 

text 
@d2 1 
d4 1 
a4' 1 

* Copyright (C) SWSoft 1999-2000 
d6 1 

a6 6 

* Author: Ruslan Iljin 

* E -mail : media@@www . rt . mipt . ru 
* 

* 

* Last correct: 

© 



$Header: /home/cvs/aspf s/topd/f serv. cc, v 1.12 2001/02/22 14:52:34 media Exp $ 



1. 12 
log 

@*** empty log message *** 



text 
@d8 1 
a8 1 

* $Header: /home/cvs/aspf s/topd/f serv. cc, v 1.11 2001/01/30 17:48:43 media Exp $ 
d58 1 
a58 1 

int tfsTopd: :fserv_j>arams_request (tfsRequestID reqid, char* buff, int lenath) 
d60 2 y 
a61 2 

PRINTD2 ("send params request to fserv\n n ); 

tfsPKT *p = new tfsPKT (TFS__MSG_T2F_PARAMS_REQ, length, buff , reqid) • 
d72 1 
a72 1 

int tfsTopd: :fserv_j)arams_reply (tfsRequestID reqid, char* buff, int lenqth) 
d74 3 
a76 3 

PRINTD2("got params reply from fserv\n M ); 

if (header_get_hops_number(buff , length)) initiate_j>arams_reply (reqid, buff, 
length) / 

else params_reply (reqid, myself ->Address , buff, length); 

@ 



1.11 
log 

@*** empty log message *** 

@ 

text 
@d8 1 
a8 1 

* $Header: /home/cvs/aspf s/topd/f serv. cc,v 1.10 2001/01/30 17:29:44 media Exp $ 
d43 1 * 
a43 1 

tf sPktType p__type [2] = {TFS_MSG_T2F__ZER0BL0CK_WRITE_REQ, 
TFS_MSG_T2F_SLICE_WRITE_REQ} ; 
d45 2 
a46 1 

else PRINTD2 ("send zeroblock write request to fserv\n n ); 

@ 



1.10 
log 

@*** empty log message *** 

@ 

text 
@d8 1 
a8 1 

* $Header: /home/cvs/aspf s/topd/f serv. cc, v 1.9 2000/12/05 11:43:19 media Exp $ 
d54 7 

@ 



1.9 
log 

@*** empty log message *** 

@ 

text 
@d8 1 

a8 1 .i^^^^ 

* $Header: /home/cvs/aspf s/topd/f serv. cc,v 1.8 ^I^HHBp 09:38:18 media Exp $ 
d62 7 



1.8 
log 

©*** empty log message *** 
@ 

text 
@d8 1 
a8 1 

* $Header: /usr/virtual/cvsroot/torf s/topd/f serv.cc,v 1.3 2000/11/29 12:21:34 
media Exp $ 
d35 1 
a35 1 

C_f serv->PutQ (p->copy () ) ; 

@ 



1.7 
log 

@*** empty log message *** 

@ 

text 

@@ 



1.6 
log 

@*** empty log message *** 

@ 

text 

@@ 



1.5 
log 

@*** empty log message *** 

@ 

text 

@@ 



1.4 
log 

©*** empty log message *** 

@ 

text 

@@ 



1.3 
log 

@*** empty log message *** 

@ 

text 

@@ 



1.2 
log 

©*** empty log message *** 

@ 

text 
@d8 1 
a8 1 



* $Header: /usr/virtual/cvsroot/torf s/topd/f serv. cc, v 1.2 JHMBMfe 10:33:3 
media Exp $ 

PRINTD2 ("send slice write request to fserv\n"); 
// if (type==TFS_MSG__C2T_BL0CK_WRITE_REQ) type-TFS MSG T2F BLOCK WRITE REO ■ 
a61 7 ~ 

} 

int tfsTopd: : f serv_write_reply ( tf sRequestID reqid, char* buff, int length) 
PRINTD2(»got write reply from fserv\n") ; 

if (header_get_hops_number (buff , length)) initiate_write_reply (reqid, buff, 
length) ; 

else write_reply (reqid, myself ->Address, buff, length); 



1.1 
log 

@*** empty log message *** 

@ 

text 



TonojiorHHecKHH cepeep (TopD) 



BepcHH 1.2 o 




TonojiorHHecKHH cepBep - noAAeptfCHBaeT CB«3H0CTb ceTH, oTBenaeT 3a 
onpeaejieHne h cjieaceHHe 3a H3MeHeHHeM TonojiorHH cera h BbmejieHHe y 
Kaxc^oro KOMnbioTepa Ha6opa coceAeii. Bee B3aHMOAeHCTBHH Me^my 
KOMnoHeHTaMH TorFS ocymecTBJiaioTCfl nocpe^CTBOM TonojiorHnecKHx 
cepBepoB. KoMnbioTep Ha kotopom 3anymeH TonojiorHHecKHH cepBep flBjiaeTca 
ho^oh TorFS. 

TaKHM o6pa30M ocHOBHaa 3aAana TonojioranecKoro cepBepa - BbinojiHeHHe 
3anpocoB TorFS h H3MeHeHHe h noAAep>KaHHe cbjhh c cocezuiMH. 

TonojiorHHecKHH cepBep (Aajiee TopD) BbinonHaeT cneAyiomne 3anpocbi: 
oTKpbiTHe (j)aftjia (iiohck noKpbiTHH, SNAP), HTemie cjiafica (READ), 3anncb 
cjianca (WRITE) 

IlpH 3anycKe TopD nojiynaeT cnncoK coce^en (neighbour list), HeKOTopbie H3 
3thx coceAen 6yayT eraTHHecKHMH, to ecTb TopD 6yneT CTapaTbca 

nO,Zmep5KHBaTb nOCTOflHHyK) CB)I3b C HHMH, OCTaJlbHbie ^HHaMHHeCKHMH, TO eCTb 

ohh 6yayT 3aMeHHTbCH c noMOiiibio ajiropHTMa AHHaMHnecKoro 

KOH(J)HrypHpOBaHHH. CyTb 3TOrO ajiropHTMa B TOM, HT06bI AHHaMH^eCKHMH 

coce^MH TopD HBJWJiHCb HanGojiee 6jiH3KHe (no HeKOTopbiM KpnTepn^M) hoabi 
TorFS. Kaacflbin coceA b cnncKe HMeeT cboh HOMep, KOTopbift Hcnojib3yeTca npn 
MapnipyTH3auHH naKeTOB. CoceAOM HOMep 0 ABJHieTCJi caM TopD. 

B cjiynae ecjra BApyr CB«3b co CTaTnnecKHM coceAOM npepbiBaeTca, to 
npoBO^HTca nepnoAHHecKaa npoBepxa AOCTynHocTH 3toh hoabi h KaK TOJibKO 
OHa CTaHeT AoerynHOH TopD npncoeAHHHTCH k Hen KaK k CTaTHnecKOMy coce^y. 
Bo3mo>kho TaK>Ke 3anncbiBaTb CTaracraKy AocrynHOCTH TaKoro coce^a, h 
npe^ynpe^eHHe a^MHHHCTpaTopa o n HeHaAe>KHOCTH 11 eraTHHecKoro coce^a. 

AuropHTM H3MeHeHH« ^HHaMHHecKHX coce^en b cnncKe cjieAyiomHH. Run 
Hanajia mm HMeeM cnncoK xothGm hx OAHoro AHHaMHnecKoro coceAa. ^anee mm 
nojiynaeM ot Hero IP a^peca ero coceAen h nocbuiaeM naKeT ping no 3thm IP, 
Aanee BbiGnpaeM H3 hhx coceAen c HaHMeHbuiHM BpeMeHeM OTKjiHKa h 
Ao6aBjnieM ero b cnncoK. Embiiihh b cnncKe cepBep yAajraeM. ^ajiee onepaana 
noBTopaeTC^ ao Tex nop noKa He yaacTca Hairra cpeAH nojiyneHHoro cnHCKa IP 
HOAbi c MeHbuiHM BpeMeHeM OTKjiHKa MeHee neM TeKymnfi AHHaMHnecKHH coceA. 
Pa3yMeeTCH npH nojiyneHHH cnncKa IP H3 Hero ncioiioHaKyrcji CTarnHecKHe 
coceAH. TaK »ce npn KOJinnecTBe AHHaMHHecKHX coceAefi 6onbuie oahoto, ohh 
TaK^ce HCKJiK)HaK)TCJi H3 nojiynaeMbix cnncKOB IP. 

J\ji% xpaHeHHH HcnoJiHHeMbix 3anpocoB TopD Hcnojib3yeT Ta6nniiy 3anpocoB 
(request table). B Heft coAepacaTca ID 3anpoca (reqid, yHHKajibHbiH , 



reHepHpyiomHHOi Ha kjihchtc), fid, bid, tid (hjih rid), ran 3anpoca (op), CTa^HH 
3anpoca (stage), TafiMayr jxjul 3toh cTa^HH 3anpoca, h HeKOTOpaa cneipNjwHecKafl 
rjix ^aHHoro 3anpoca HH^opMaijHH. A 



OTKpbiTHe 4>aifjia (SNAP) 

ripH nojiyneHHH 3anpoca Ha noHCK noicpbroui TopD noMemaeT 3tot 3anpoc b 
request table, HHHUHajiH3HpyeT niix 3Toro 3anpoca cnncoK cepBepoB (server list) 

H CnHCOK IIOKpblTHH 

(snap list). Server list Heo6xo#HM ^jm Toro, hto6bi xpaHHTb nyra k cepBepaM Ha 
KOToptix 6bijih o6Hapy)KeHbi cjiaftcbi, a snap list Hcnojib3yeTca ajih xpaHeHHa h 
aHajiH3a noKpbrnra. 

TFSJMSG_C2T_SNAP_REQ "fr" - BbinojiHHTb 3anpoc Ha hohck noKpbrraa 

rtepBaa CTa^HH (stage=l) HcnoJiHemifl 3anpoca Ha noncK noKpbirafl, 
3aKjnoHaeTCH b paccbuiKe BceM coce^M 3Toro 3anpoca, 3anpoc >Ke nocjiaHHbiM 
caMOMy ce6e orapaBJiaeTCH 4>aiiJioBOMy cepBepy, ecjiH TaKoft 3anymeH Ha 
#aHHOH Ho^e TorFS. ^ajiee Ka)K£biH coce^HHH TopD, nojiyHHBUiHH 3tot 3anpoc 
nocbmaeT ero Ha cboh ^afijiOBbiH cepBep BceM cbohx coce^M, KpoMe Toro, ot 
KOToporo oh npHineji. ^ajiee Bee noBTopaeTca . TaKHM o6pa30M 3anpoc Ha noHCK 
pacnpocTpaHaeTca ot o^hoh horu TorFS no BceM cera. J\nx npe/jOTBpameHHJi 
3aTonneHH^ cera TaKHMH naKeTaMH orpaHHHeHo kojikhcctbo maroB (MAX HOP) 
h BpeMfl >kh3hh nepBoft cTa^HH 3anpoca. Jim npe^OTBpameHHH 
3aKOJibuoBbiBaHHJi (loop) naKeTOB npn npoxoacaeHHH naiceTa nepe3 nojxy TorFS 
TopD floGaBjMeT 3tot 3anpoc (c ^aHHbiM reqid) b request table h noMenaeT 3to 
3anpoc KaK TpaH3HTHbiii (op=TOPD_PROXY) h 3anpemaeT noBTopHoe 
npoxo^CAeHHe TpaH3HTHbix 3anpocoB c ^aHHbiM reqid nepe3 3Ty HO#y TorFS b 
TeneHHH 5KH3HH 3Toro 3anpoca. 

Bee naiceTbi, KOTopbiMH oGMeHHBaiOTCH Meac^y co6oh TopD, KpoMe Tejia 3anpoca 
co^ep)KaT eme 3aronoBOK (header) Heo6xo,zjHMbiH rjix Mapmpyra3aijHH naKeTOB 
b TorFS, ohh coaepacaT BpeMH 3KH3HH naiceTa, nyTb npofi^eHHbiH naKeTOM h ero 
AJiHHy. 3tot header nepe^aeTCfl h 4>aftjiOBOMy cepBepy, h B03BpamaeTca npn 
OTBeTe ot ^anjiOBoro cepBepa HeH3MeHHbiM. 

Ba^cHo, hto tot reqid, kotopmh nocTynaeT Ha HO#y, HHHUHHpyiomyK) noncK 
noKpbiTH5i, nepe^aeTCfl co BceMH naKeTaMH h ocTaeTca HeH3MeHHbiM ao 
oKOHHaHHH BbinojiHeHHa 3anpoca h aaace nepeflaeTca b 3anpoc Ha HTeHHe 
Hcnojib3yiomHH .aaHHoe noKpbirae. 

TFS_MSG_T2TJSNAP_REQ "Hfr" - c noMombio 3thx naKeTOB nponcxo^HT 
pacnpocTpaHeHHe 3anpoca ot TopD k coce^HHM TopD 



TFS_MSG_T2F_SNAP_REQ "Hfr" - nepe^ana 3anpoca Ha hohck noKpbrraa 
4>aHjioBOMy cepBepy 

ripn nojiyneHHH OTBCTa ot ^aiuioBoro cepBefca TopD aHajiH3HpyeT header h 
HaHHHaeT npouecc B03BpameHH$i OTBeTa Ha 3anpoc k HHHijHaTopy no ijenoHKe 
TopD. 

TFSJMSG_F2T_SNAP_REPLY "Hftxm4(t4(bs))" - otbct ot 4>aHjiOBoro 
cepBepa, t - MaKCHMajiBHWH tid b noKpbiTHH, xm - HyjieBOH 6jiok, aajiee othcok 

CJiaHCOB B nOKpblTHH. 

TFS MSG T2T_SNAP_REPLY "Hftxm4(t4(bs))" - B03BpameHHe sanpoca k 
HHHUHaTopy 3anpoca. 

TopD HHHUHHpoBaBuiHH 3anpoc nonynaeT otbctbi ot <J)aHJiOBbix cepBepoB cera, 
3aH0CHT hx b server list h snap list. Flo HCTeneHHH TaftMayTa Ha stage=l, TopD 
BbmaeT OTBeT KJineHTCKOMy cepBepy, npncjiaBuieMy stot 3anpoc. 

TFSJMSG_T2C_SNAP_REPLY "ftxm4(tb)" - B03BpamaeT noKpbirae KJineHTy 

Elocjie 3Toro 3tot 3anpoc nepexo^HT b stage=2 h xpaHHTca HeKOTopoe BpeMfl b 
request table Ha cjiynaii ecjin 6y#eT BbinonHJiTbCfl 3anpoc Ha HTeHHe flaHHoro 
reqid. 



HTeHHe cjiafica (READ) 

IlpH nojiyneHHH 3anpoca Ha HTeHHe TopD HaxojmT SNAP juw stoto 3anpoca y 
ce6a b request table, Haxo^HT HOMepa cjiaficoB h nyra k (JmftjioBbiM cepBepaM 

TFSJ\lSG_C2T_BLOCK_READ_REQ "fl>t" - 3anpoc H a HTeHHe 6jioKa 

^ajiee TopD nocujiaeT Heo6xo£HMoe hhcjio 3anpocoB Ha HTeHHe cjiaftcoB 
HaH^eHHbiM b server list 4>aftjioBbiM cepBepaM 

TFS_MSGJT2T_SLICE_READ_REQ "Hjbts" - nepe^ana 3anpoca Ha HTeHHe 
cjiaiica k HyacHOMy 4>aHJiOBOMy cepBepy nepe3 uenonicy TopD 

nocjie toto KaK 3anpoc ^oxoaht ao nocjieAHero b uenonice TopD, tot 
nepecbuiaeT 3anpoc ^aftjioBOMy cepBepy. 

TFS_MSG_T2FJSLICE_READ_REQ "Hjbts" - nepeaana sanpoca Ha HTeHHe 
cjianca 4>aHJiOBOMy cepBepy 



TFS_MSG_F2T_SLICE_READ_REPLY "Hfbtsxm " - otbct ot (JmfijioBoro 
cepBepa, xm - pa3Mep cjiaftca h caM cjiafic 

Otbct tsk ace no uenoHKe B03BpamaeTca HHHUHaTopy 3anpoca Ha htchhc 

TFS_MSG_T2T_SLICE_READ_REPLY "Hfbtsxm " - nepe^ana OTBeTa no 
uenoHKe TopD 

TFS_MSG_T2C_BLOCK_READ_REPLY "fbtsxtn " - nepe^ana OTBeTa 
KJineHTy 

lanncb cjiafica (WRITE) 

b flaHHWH MOMeHT Haxo^HTCH b nporjecce o6cyacfleHHH... 
TFS_MSG_C2T_BLOCK_WRITE_REQ "fbtsxm" - 
TFS_MSG_T2T_SLICE_WRITE_REQ "Hfbtsxm" - 
TFS_MSG_T2F_SLICE_WRITE_REQ "Hfbtsxm " - 
TFS_MSG_C2T_ZEROBLOCK_WRITE_REQ "txm" - 
TFS_MSG_T2T_ZEROBLOCK_WRITE_REQ "Htxm" - 
TFS_MSG_T2F_ZEROBLOCK_WRITE REQ "Htxm" - 
TFS_MSG_F2T_ZEROBLOCK_WRITE_REPLY "Ht2" - 
TFS_MSG_T2T_ZEROBLOCK_WRITE_REPLY "Ht2" - 
TFS_MSG_T2C_ZEROBLOCK_WRITE_REPLY "t2" - 



MapmpyTH3auHsi naiceTOB 

JSjw MapmpyTH3an;HH naxeTOB b TorFS Hcnojib3yeTC« 3arojiOBOK (header) naKeTa, 
KOTopbiM o6ivieHHBaiOTca MeH^y co6oh TopD. 3aronoBOK HMeeT nepeMeHHyio 
OTHHy 3aBHCsmyio ot fljnrabi nyra. IlyTb - 3-to nocneflOBaTejibHWH cnacoK 
HOMepoB coce^efi no uenoHKe KOTopbix men naiceT. Kaambifi 6airr o6o3HanaeT 
HOMep oflHoro H3 coce^efi, TaKHM o6pa30M koji-bo coce^eii orpaHHneHO uH<|)poH 
255 (Kaacaafl Hoaa xajineTCn win ce6a cocczjom HOMep 0). 



d>opiviaT 3arojiOBKa: 



TTD 




flag 


count 


hops 


... path 


(4bytes) 


(1 bytes) 1 


(1 bytes) 


(1 bytes) ; 


(1 bytes) 


(hops byi 



TTD (time to death) - 3to BpeMfl b KOTopoe naKeT aojmeH npeKparaTb cBoe 

cymecTBOBaHHe. 

- pe3epBHBiH 6afiT 

flag - HanpaBjieHHe npoxo^a naKeTa, npHMoe hjih oGparaoe 
count - no3HUH)i b nyTH b aaHHbift momcht 
hops - o6maa flJiHHa b nyra 
path - nyTb 

TaKHM o6pa30M npn pacnpocTpaHeHHH 3anpoca Ha iiohck noKpbrraa (SNAP) mbi 
M05KeM 3anoMHHaTb nyTb npoHtfeHHbiH naKeTOM. JJjm 3Toro Ha HHHqHHpyiomeH 
MauiHHe, nopo)KaaeTCfl hobmh 3arojiOBOK (flag=l, count=0, hops=0 5 
path=NULL). ^ajiee coce^CKHH TopD, nojiyHHBHiHH naKeT ot HHHijHaTopa, 
yBejiHMHBaeT hops h count, flo6aBJUieT b path HOMep coce,aa ot KOToporo npnuieji 
naKeT. IlaKeTbi nepeaaBaeMbie cocezmM yace coaepacaT hobmh 3arojiOBOK. 
Coce^H npHHHMaiOT naKeT c HOBbiM 3aronoBKOM h noBTOpaiOT 3Ty npoije^ypy. 



pHC. 1 PacnpocTpaHeHHe 3anpoca Ha fiohck noicpbiTHa 

Ha pHc.l npoHjuiKDCTpHpoBaHo pacnpocTpaHeHHe naKeTOB. TopD Ha Ka^on H3 
ManiHH npH nojiyneHHH naKeTa H3MeHaeT nojra count h hops h ,zjo6aBJi5ieT b path 
HOMep coce^a ot KOToporo 6bui nojiyneH naKeT. 

OaHjioBOMy cepBepy TaK^ce nepe^aeToi 3tot 3aroJioBOK, co#ep>KaHHe KOToporo 
4>aHjiOBbiH cepBep B03BpamaeT HeH3MeHHbiM. Tenepb, nojib3yacb co^epxcHMbiM 
3Toro 3arojioBKa, mm mo>kcm BepHyTb naKeT Ha3a#. ITycTb TopD Ha Ho^e "E" 
nojiyHHJi oTBeT ot cBoero ^aftjiOBoro cepBepa. Oh npocTaBjraeT flag=2 h 
aHajiH3HpyeT 3arojioBOK naKeTa count-3, path= 1,3,7. IlaKeT #oji>KeH 6biTb 
nepe^aH coce^y c HOMepoM 7, to ecTb HO^e D. TopD Ha HO#e D npn nojiyneHHH 
naKeTa yMeHbuiaeT 3HaneHHe count, npn 3tom hhcjio hops ocTaeTca 
HeH3MeHHbnvi. Tenepb count=2, path=l,3,7. OrnpaBjiaeM naKeT coce^y tiojx 
HOMepoM 3 (HO#a B). TaKHM o6pa30M Kor^a naKeT npH^eT Ha HOAy A, nocjie 



yMeHbmeHHfl Ha 1 count eraHeT paBHbiM 0. TopD aejiaeT BbiBoa o tom, hto naKeT 
BepHyjica HHHijHaTopy noncKa, oh HmeT no reqid jxanuhix 3anpoc b request table 
h HanHHaeT aHajiH3 oTBeia. 

CymecTByeT o^ho "ho", ecjiH He MeHJiTb co^ep^cHMoe nyra npn o6paTHOM 
npoxo,ne naxeTa, to nocjie 6y#eT hcbo3mo)kho onpe^ejiHTb nyTb k (JmiuioBOMy 
cepBepy Ha HO#e E. Beab 3tot nyTb BepeH TOJibKO ot ho^m E ro ho^m A. J\sin 
nyTH ot hojiu A ro uojihi E, nyTb .zjojraeH BbirjMAeTb TaK - 2,5,4. J\ji% 3Toro npn 
npoxo5K,aeHHH naxeTa ot hoam E k HO#e A KaacAbift TopD nepe3 KOTopbiii 
npoxo^HT naKeT, nepefl yMeHbineHHeM count mchhct 3HaneHHe b nyTH Ha HOMep 
coceaa ot KOToporo npHineji B03BpamaiomHHCfl naiceT. TaKHM o6pa30M nyTb npn 
b naKeTe bbixozuim H3 hoam D k hojxq E GyfleT TaKHM - 1,3,4. OKOHHaTejibHbift 
nyrb Ha Ho#e A 6yaeT - 2,5,4. 



pHC.2 B03BpameHHe naKeTa HHHUHaTopy 

B cjicayiomHH pa3 (npn peamnaijHH htchhh) naKeT noH^eT no nyra 2,5,4 no 
xo^y MeHKa 3HaneHHH b nyro TaK hto Ha Ho^e E onaTb 6y#eT nyTb 1,3,7 h naKeT 
c npoHHTaHHbiM cjiaftcoB no 3TOMy nyra BepHeTCii Ha nojxy A. 

B cjiynae OTKjnoneHHJi hjih npepbiBaHHJi cbji3h c cocczjom, cocea yflajmeTCfl H3 
jiHCTa He cpa3y, a noMenaeTca KaK HeaKTHBHbiH, h Bee naKeTbi KOTopbie 
npeAHa3HanaiOTCfl rjisl Hero, 0T6pacbiBaK)TCfl, h tojibko nepe3 HeKOTopoe BpeMsi 
Kor^a Bee 3anpocbi KOTopbie MorjiH 6bi npoxo/piTb nepe3 3to coeAHHeHHe 
3aKOHHaTCH. IIotom b cnHCOK Ha MecTO 3Toro coce^a mm MoaceM noMemaTb 

HOBbie COeAHHeHHfl. 

TaKHM o6pa30M peajiH30BaHa MapinpyTH3aijHfl naKeTOB. 
IIpHMeHaHHe: 

B oniicaHHflx 4>opMaTOB sanpocoe h otbctob HcnoJib3yH>TCfl oie/iyiomHe chm bojim: 

• 'H' — 3arojioBOK (header) npoTOKOJia oGMeHa ucmsy TopD 

• /'-FID 

• 'r'-RID 

• 'b ' - BID 

• V-SID 



. r-TiD 

• xm ' - pa3Mep KycKa #aHHbix + caM KycoK #aHHbix 

• t 4', 5 2', , r - cooTBeTCTBeHHO 4,2 h 1 6aHTHbie nojifl 3anpocoB, o6mhho 
Hcnojib3yioTC5i una nepcaa™ KOJi-sa 3JieivieHTOB b nepeaaBaeMOM cithcrc 



1 . OnncaHHe thiiob aaHHbix Hcnonb3yeMbix b TorFS: 







11 
11 

111 
ill 






FID 

i 


128 


yHHKaJILHBIH HJ^CHTH (|)HKaTOp (j)aHJia B (|)aHJIOBOH 


BID 


32 


H^eHTH(J)HKaTop (HOMep) 6jiOKa b (j>aHjie 


SID 


16 


HzieHTH(|)HKaTop cjiaftca b 6noKe 


TID 


64 


H^eHTH^JHKaTop TpaH3aKi^nH (nepBbie 4 6aHT - BpeMa co3r 


RID 


64 


BpeMeHHofi HHTepBaji 
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TonojiorHHecKHH cepBep (TopD) 

[TOPOLOGICAL SERVER TOPD] 

BepcHH 1.2 ot [DATE REDACTED] 

[VERSION 1.2 OF [DATE REDACTED]] 

TonojiorHHecKHH cepBep - no,zwep5KHBaeT cBJOHOCTb cera, OTBenaeT 3a 
onpe^ejieHHe h cjieaceHne 3a H3MeHeHne\i TononorHH ceTH h BbmejieHne y 
Ka^yioro KOMnbiOTepa Ha6opa coceaeft. Bee B3aHMO^eHCTBHH Meayry 
KOMnoHeHTaMH TorFS ocymecTBjraioTCH nocpeflCTBOM TonojiorHnecKHx 
cepBepoB. KoMnbioTep Ha kotopom 3anymeH TononorHnecKHH cepBep HBjiaeTca 
ho^oh TorFS. 

[THE TOPOLOGICAL SERVER MAINTAINS THE CONNECTIVITY OF 
THE NETWORK, AND IS RESPONSIBLE FOR DEFINING THE NETWORK 
AND KEEPING TRACK OF THE CHANGE OF THE TOPOLOGY OF THE 
NETWORK, AS WELL DEFINING, FOR EACH COMPUTER, A SET OF ITS 
NEIGHBORS. ALL INTERACTIONS BETWEEN COMPONENTS OF TORFS 
ARE CONDUCTED BY MEANS OF TOPOLOGICAL SERVERS. THE 
COMPUTER ON WHICH THE TOPOLOGICAL SERVER IS LAUNCHED IS 
A NODE OF TORFS] 

TaKHM o6pa30M ocHOBHaa 3smana TonojiorHnecKoro cepBepa - BbinojiHeHHe 
3anpocoB TorFS h H3MeHeHHe h noflaepHcairae cba3h c cocezyiMH. 

[THERE, THE PRIMARY TASK OF THE TOPOLOGICAL SERVER IS 
RESPONDING TO REQUESTS OF TORFS AND CHANGING AND 
MAINTAINING THE CONNECTIONS TO THE NEIGHBORS] 

TonojiorHHecKHH cepBep (aajiee TopD) BbinojiHjieT cjieayiomHe 3anpocu: 
OTKpbiTHe (])aHjia (iiohck noicpbiTHH, SNAP), HTemie cjiafica (READ), 3anncb 
cjiaftca (WRITE) 

[THE TOPOLOGICAL SERVER (HEREAFTER, TOPD) RESPONDS TO THE 
FOLLOWING REQUESTS: OPENING OF THE FILE SEEKING OF 
COVERAGE, SNAP), READING OF A SLICE (READ), WRITING OF A 
SLICE (WRITE)] 

npn 3anycKe TopD nojiynaeT ciihcok coceaeft (neighbour list), HeKOTopbie H3 
3thx coce^eH 6yayT CTaraHecKHMH, to ecTb TopD 6yaeT cTapaTbca 

nO^mep^CHBaTb nOCTOHHHyiO CB33b C HHMH, OCTajIbHbie flHHaMHHeCKHMH, to ecTb 

ohh GyjryT 3aMeHHTbca c noMoinwo ajiropHTMa flHHaMHHecicoro 

KOH<])HrypHpOBaHHJI. CyTb 3TOIX) ajiropHTMa B TOM, HTo6bI flHHaMHHeCKHMH 

coceaaMH TopD HBjuuiHCb HaH6ojiee 6jiH3KHe (no HeKOTopbiM KpirrepHHM) hczjm 



TorFS. Kaacflbift coce,n, b cnHCKe HMeeT cboh HOMep, KOTopbifi Hcnojib3yeTca npn 
MapiupyrH3auHH naKeTOB. CoceaoM HOMep 0 aBJiaeTca caM TopD. 

[UPON LAUNCH, THE TOPD RECEIVES A LIST OF NEIGHBORS, SOME 
OF WHICH MIGHT BE STATIC, IN OTHER WORDS, TOPD WILL 
ATTEMPT TO MAINTAIN PERMANENT COMMUNICATION WITH 
THEM, AND OTHERS WILL BE DYNAMIC, IN OTHER WORDS, THEY 
WILL BE SUBSTITUTED USING A MECHANISM OF DYNAMIC 
CONFIGURATION. THE GIST OF THE ALGORITHM IS IN ENSURING 
THAT THE DYNAMIC NEIGHBORS OF TOPD ARE THOSE NODES OF 
TORFS THAT ARE THE CLOSEST (BY SOME CRITERIA). EACH 
NEIGHBOR IN THE LIST HAS ITS OWN NUMBER, USED FOR PACKET 
ROUTING. THE NEIGHBOR NUMBER 0 IS THE TOPD ITSELF.] 

B cjiynae ecjin B^pyr cb93b co craTHHecKHM cocczjom npepbreaeTca, to 
npoBOflHTca nepHOflHHecKas npoBepKa AOCTyimocTH 3toh ho^m h KaK TOJibKO 
OHa CTaHeT aocTyimoH TopD npHcoezmHHTca k Heft KaK k CTaTHHecKOMy coceay. 
Bo3mo)kho TaKace 3anHCbreaTb CTaracTHKy flocTyiraocTH TaKoro coceaa, h 
npeaynpeacaeHHe atfMHHHCTpaTopa o "HeHafleacHocra" CTaraHecKoro coceaa. 

[IN THE EVENT THAT CONNECTION TO A STATIC NEIGHBOR IS 
INTERRUPTED, A PERIODIC CHECK IS MADE REGARDING THE 
AVAILABILITY OF THIS NODE, AND AS SOON AS IT BECOMES 
AVAILABLE AGAIN, TOPD WILL CONNECT TO IT AS TO A STATIC 
NEIGHBOR. IT IS ALSO POSSIBLE TO LOG STATISTICS OF THE 
AVAILABILITY OF SUCH A NEIGHBOR, AND TO WARN THE 
ADMINISTRATOR OF THE UNRELIABILITY OF SUCH A NEIGHBOR] 

AnropnTM H3MeHeHHa flHHaMHHecKHX coceaeft b cnHCKe cjie^yiomHH. .ZLia 
Hanajia mm HMeeM ciihcok xoth6h hx oohoix) flHHaMHHecKoro coceaa. JJanee mh 
nonynaeM ot Hero IP aapeca ero coceaefi h nocbuiaeM naKeT ping no sthm IP, 
aanee Bbi6HpaeM H3 hhx cocezjefi c HaHMeHbuiHM BpeMeHeM OTKJiHKa h 
ao6aBjiaeM ero b cnncoK. Ehbiiihh b cnncKe cepBep yaajiaeM. JJanee onepauna 
noBTopaerca p,o Tex nop noKa He yzjacTca HafiTH cpe^H nojiyneHHoro cnncKa IP 
HOflbi c MeHbniHM BpeMeHeM OTKJiHKa MeHee neM TeKymHH AHHaMHHecKHH cocezj. 
Pa3yMeeTca npn nojryneHHH cnHCKa IP H3 Hero HCKjnonaiOTca CTaniHecKHe 
coce^H. TaK ace npn KOJiHHecTBe flHHaMHHecKHX coceaeft 6oJibine o^Horo, ohh 
TaKHce HCKjiioHaiOTca H3 nojiynaeMbix cfihckob IP. 

[THE ALGORITHM FOR CHANGING THE DYNAMIC NEIGHBORS IN THE 
LIST IS AS FOLLOWS. AT FIRST, WE HAVE AN INITIAL LIST OF AT 
LEAST ONE DYNAMIC NEIGHBOR. NEXT, WE RECEIVE FROM THE 
LIST THE IP ADDRESSES OF ITS NEIGHBORS, AND SEND A PING 
PACKET TO THESE IP ADDRESSES, THEN SELECT IN THAT LIST A SET 
OF NEIGHBORS WITH THE SMALLEST RESPONSE TIME AND ADD IT 
TO THE LIST. A SERVER FORMERLY BEING PRESENT IN THE LIST IS 



REMOVED. THEN, THE OPERATION CONTINUES, UNTIL IT IS 
POSSIBLE TO FIND AMONG THE FORMED LIST AN IP NODE WITH A 
SMALLER RESPONSE TIME THAN THE CURRENT DYNAMIC 
NEIGHBOR. OBVIOUSLY, UPON FORMING THE IP LIST, STATIC 
NEIGHBORS ARE REMOVED FROM IT. ALSO, IF THE NUMBER OF 
DYNAMIC NEIGHBORS IS GREATER THAN ONE, THEY ARE ALSO 
REMOVED FROM THE IP LIST BEING FORMED] 

^jih xpaHeHHJi HcnoJiHHCMbix 3anpocoB TopD Hcnojib3yeT TaSjinuy 3anpocoB 
(request table). B Heft co#ep>KaTCfl ID 3anpoca (reqid, yHHKanbHbift , 
reHepHpyiomHftcfl Ha KJiHeHTe), fid, bid, tid (hjih rid), ran 3anpoca (op), cjajw* 
3anpoca (stage), TaftMayT rjir 3toh CTa^HH 3anpoca ? h HeKOTOpaa cneiiH^HnecKaa 
j\m aaHHoro 3anpoca HH^opMaijHH. 



OTKpbiTue 4>aHJia (SNAP) [FILE OPENING] 

Tlpn nojiyneHHH 3anpoca Ha iiohck noKpbiraa TopD noMemaeT 3tot 3anpoc b 
request table, HHHijHajiH3HpyeT fljw 3Toro 3anpoca cnncoK cepBepoB (server list) 
h cnHCOK noKpbiraft 

(snap list). Server list Heo6xo^HM rjw toto, HTo6bi xpaHHTb nyra k cepBepaM Ha 
KOTopbix 6buin o6Hapy^ceHbi cnaftcbi, a snap list Hcnojib3yeTCJi xpaHCHHH h 
aHajiH3a noKpbiraji. 

TFSJMSG_C2T_SNAP_REQ "fr" - BbinojiHHTb 3anpoc Ha hohck noKpbiraa 

nepBan CTa^HH (stage=l) HcnojiHeroDi 3anpoca Ha noncK noKpbiraa, 
3aKJiiOHaeTca b paccbuiKe BceM coce^M 3Toro 3anpoca, 3anpoc 3fce nocjiaHHbift 
caMOMy ce6e orapaBJUieTCfl ^aftnoBOMy cepBepy, ecjiH TaKoft 3anymeH Ha 
^aHHoft Ho/je TorFS. ,fl,ajiee Ka^abift coce/tfraft TopD, nojiynHBHinft 3tot 3anpoc 
nocbijiaeT ero Ha cboh ^aftjioBbift cepBep BceM cbohx coce/jaM, KpoMe Toro, ot 
KOToporo oh npnuieji. ^ajiee Bee noBTopaeTca. TaKHM o6pa30M 3anpoc Ha noncK 
pacnpocTpaHHeTca ot o#hoh hozu>i TorFS no Bceft cera. JSjw npejjOTBpameHHfl 
3aTonjieHHH cera TaKHMH naKeTaMH orpaHHHeHO kojimhcctbo uiaroB (MAX HOP) 
h BpeMa 3KH3HH nepBoft CTa^HH 3anpoca. J\jix npeaoTBpameHHfl 
3aicojibuoBbiBaHH)i (loop) naKeTOB npn npoxoacaeHHH naiceTa ^epe3 Ho^y TorFS 
TopD AoGaBjmeT 3tot 3anpoc (c flaHHbiM reqid) b request table h noMenaeT sto 
3anpoc KaK TpaH3Hrabift (op=TOPD_PROXY) h 3anpemaeT noBTopHoe 
npoxo^aeHHe TpaH3HTHbix 3anpocoB c AaHHbiM reqid nepe3 3Ty Hoziy TorFS b 
TeneHHH 5KH3HH 3Toro 3anpoca. 

Bee naiceTbi, KOTopbiMH o6MeHHBaioTC)i Meywy co6oh TopD, KpoMe Tejia 3anpoca 
co^epacaT eme 3arojioBOK (header) Heo6xoziHMbift njix Mapmpyra3auHH naKeTOB 
b TorFS, ohh coflepacaT BpeMH 5KH3HH naKeTa, nyTb npoft^eHHbiH naKeTOM h ero 



ZUiHHy. 3tot header nepeaaeTCJi h 4>a8jiOBOMy cepBepy, h B03BpamaeTca npn 
OTBeTe ot 4>aHJiOBoro cepBepa HeH3MeHHbiM. 

BaacHO, hto tot reqid, kotopmh nocTynaeT Ha HO#y, HHHijHHpyiomyK) noHCK 
noKpHTHH, nepe^aeTca co BceMH naKeTaMH h ocTaeTca HeH3MeHHbiM %o 
OKOHHaHM BbinojiHeHHH 3anpoca h Aaace nepeaaeTca b 3anpoc Ha HTeHHe 
HcnoJib3yiomHH aaHHoe noKpbiTHe. 

TFSJVISG_T2T_SNAP_REQ "Hfr" - c noMouibio 3thx naKeTOB npoHCXo^HT 
pacnpocTpaHeHHe 3anpoca ot TopD k coce/jHHM TopD 

TFS_MSG_T2F_SNAP_REQ "Hfr" - nepeaana 3anpoca Ha hohck noKpbrraa 
(fjaiuiOBOMy cepBepy 

ripH nojiyneHHH OTBeTa ot 4>aHJiOBoro cepBepa TopD aHajiH3HpyeT header h 
HanHHaeT npouecc B03BpameHHH OTBeTa Ha 3anpoc k HHHunaTopy no uenonice 
TopD. 

TFS_MSG_F2T_SNAP_REPLY "Hftxm4(t4(bs))" - otbct ot (JmiijiOBoro 
cepBepa, t - MaKCHMajibHbift tid b noKpbiTHH, xm - HyneBoft 6jiok, #ajiee cnHCOK 

CJiaftCOB B nOKpblTHH. 

TFS_MSG_T2T_SNAP_REPLY "Hftxm4(t4(bs))" - B03BpameHHe 3anpoca k 
HHHi^HaTopy 3anpoca. 

TopD HHHiiHHpoBaBHiHH 3anpoc nojiynaeT otbctw ot ^afijiOBbix cepBepoB cera, 
3aHOCHT hx b server list h snap list. Ilo HCTeneHHH TaflMayTa Ha stage=l, TopD 
BbmaeT otbct KJiHeHTCKOMy cepBepy, npHCJiaBuieMy 3tot 3anpoc. 

TFS_MSGJT2C_SNAP_REPLY "ftxm4(tb) ft - B03B P amaeT noKpbiTHe KjineHTy 

ITocjie 3Toro 3tot 3anpoc nepexo^HT b stage=2 h xpaHHTca HeKOTopoe BpeMa b 
request table Ha cjiynaii ecjiH 6yzjeT BbinojiHOTbCH 3anpoc Ha HTeHHe ^aHHoro 
reqid. 



HTeHHe cjiaiica (READ) [SLICE READ] 

ITpH nojiyneHHH 3anpoca Ha HTeHHe TopD Haxo^HT SNAP jxjw 3Toro 3anpoca y 
ce6a b request table, Haxo^HT HOMepa cjiaftcoB h nyra k (fmftjioBbiM cepBepaM 

TFS_MSG_C2T_BLOCK_READ_REQ "Jbt" - 3anpoc Ha HTeHHe 6jioKa 

flajiee TopD nocbuiaeT Heo6xoaHMoe hhcjio 3anpocoB Ha HTeHHe cjiaficoB 
HanAeHHbiM b server list ^afijiOBbiM cepBepaM 



TFS_MSG_T2T_SLICE_READ_REQ "Hjbts" - nepeaana sanpoca H a HTeHHe 
cxtaiica k Hy^CHOMy (JmitjioBOMy cepBepy nepe3 uenonicy TopD 

nocne Toro KaK 3anpoc aoxoflHT no nocjiezmero b uenoHKe TopD, tot 
nepecbinaeT 3anpoc 4>aHjiOBOMy cepBepy. 

TFS_MSG_T2F_SLICE_READ_REQ "Hjbts" - nepe^ana sanpoca Ha HTeHHe 
cjianca ^aftjioBOMy cepBepy 

TFS_MSG_F2T_SLICE_READ_REPLY "Hfbtsxm" - OTBeT ot (fcaftjioBoro 
cepBepa, xm - pa3Mep cjiaftca h caM cnaftc 

Otbct Tax ace no uenonice B03BpamaeTCH HHHimaropy 3anpoca Ha HTeHHe. 

TFS_MSG_T2T_SLICE_READ_REPLY "Hfbtsxm" - nepe^ana OTBeTa no 
uenonice TopD 

TFS_MSG_T2C_BLOCK_READ_REPLY "fbtsxm "- nepeflana OTBeTa 
KJineHTy 



saiiHCb cjiaiica (WRITE) [SLICE WRITE] 
b flaHHbiii MOMeHT HaxoflHTCfl b npon,ecce o6cyayieHHH... 
TFS_MSG_C2T_BLOCK_WRITE_REQ "Jbtsxm" - 
TFS_MSG_T2T_SLICE_WRITE_REQ "Hfbtsxm" - 
TFS_MSG_T2F_SLICE_WRITE_REQ "Hfbtsxm" - 
TFS_MSG_C2T_ZEROBLOCK_WRITE_REQ "txm" - 
TFS_MSG_T2T_ZEROBLOCK_WRITE_REQ "Htxm" - 
TFS_MSG_T2F_ZEROBLOCK_WRITE_REQ "Htxm" - 
TFS_MSG_F2T_ZEROBLOCK_WRITE_REPLY "Ht2" - 
TFS_MSG_T2T_ZEROBLOCK_WRITE_REPLY "Ht2" - 
TFS_MSG_T2C_ZEROBLOCK_WRITE_REPLY "t2" - 



MapmpyTiuaiiHH naiceTOB [PACKET ROUTING] 



^jih MapuipyTH3aijHH naKeTOB b TorFS Hcnojib3yeTca 3arojiOBOK (header) naKeTa, 
KOTopbiM o6MeHHBaiOTCii MOK^y co6oh TopD. 3arojiOBOK HMeeT nepeMeHHyio 
AJiHHy 3aBHOimyio ot ajihhm nyTH. ITyTb - 3to nocjieflOBaTejibHbift cnncoK 
HOMepoB coce^efi no ijenonKe KOTopbix meji naKeT. KaacflMH 6aiiT o6o3HanaeT 
HOMep o^Horo H3 coce^eft, TaKHM o6pa30M koji-bo coceAen orpaHnneHO ijH<t>poH 
255 (KsowasL nojxa HBJineTCH ajih ce6a coce^OM HOMep 0). 

<DopMa 1 3arojiOBKa: [HEADER FORMAT] 



TTD 


j 


flag 


count 


hops 


... path 


(4bytes) 


(1 bytes) 

_ ! 


(1 bytes) I 


(1 bytes) 


(1 bytes) ; 


(hops byi 



TTD (time to death) - 3to BpeMH b KOTopoe naKeT aojmeH npeicpaTHTb CBoe 

cymecTBOBaHne. 

- pe3epBHbin 6aftT 

flag - HanpaBjieHHe npoxo^a naKeTa, npKMoe hjih o6paTHoe 
count - no3HijH)i b nyTH b ^aHHbifi momcht 
hops - o6maa flJiHHa b nyTH 
path - nyTb 

TaKHM o6pa30M npn pacnpocTpaHeHHH 3anpoca Ha noncK noKpMTHfl (SNAP) mm 
M03KeM 3anoMHHaTb nyrb npoH^eHHbiH naKeTOM. JJjix 3Toro Ha HHHUHHpyiomeH 
MauiHHe, nop05KAaeTca hobmh 3arojiOBOK (flag=l, count=0 5 hops=0, 
path^NULL). flajiee coce^CKHH TopD, nojiyHHBUiHH naKeT ot HHHqnaTopa, 
yBejiHHHBaeT hops h count, ,ao6aBJi5ieT b path HOMep coce^a ot KOToporo npnuieji 
naKeT. FlaKeTbi nepe^aBaeMbie coce^HM yace co^ep^caT hobmh 3arojiOBOK. 
CocejjH npHHHMaioT naKeT c hobmm 3arojiOBKOM h noBTopaioT 3Ty npoucaypy. 




pHc.l PacnpocTpaHeHHe 3anpoca Ha fiohck noicpbiTH* 



[fig. 1 distribution of request for coverage] 



Ha phc. 1 npoHJijiiocTpHpoBaHo pacnpocTpaHeHHe naiceTOB. TopD Ha Kaac^oft H3 
MauiHH npn nojiyneHHH naKeTa MMeHHeT nojia count h hops h ,ao6aBjnieT b path 
HOMep coce^a ot KOToporo Gbijt nojiyneH naKeT. 

OaftjioBOMy cepBepy TaKace nepeAaeTai 3tot 3arojiOBOK, co#ep>KaHHe KOToporo 
4)aHjioBbiH cepBep B03BpamaeT HeH3MeHHbiM. Tenepb, nojib3yflCb coflep^CHMbiM 
3Toro 3arojiOBKa, mm mo>kcm BepHyTb naiceT Ha3a#. IlycTb TopD Ha HO#e "E" 
nojiyHHji OTBeT ot CBoero ^afijiOBoro cepBepa. Oh npocTaBiuieT flag=2 h 
aHajiH3HpyeT 3arojiOBOK naKeTa count=3, path=l,3,7. FlaKeT aojdkch 6biTb 
nepe^aH coceay c HOMepoM 7, to ecTb Ho^e D. TopD Ha HO,ae D npn nojiyneHHH 
naKeTa yMeHbmaeT 3HaneHHe count, npn 3tom hhcjio hops ocTaeTca 
HeH3MeHHbiM. Tenepb count=2, path=l,3,7. OrnpaBjiaeM naKeT coceay non 
HOMepoM 3 (Ho^a B). TaKHM o6pa30M Kor^a naKeT npn^eT Ha HO^y A, nocjie 
yMeHbineHHH Ha 1 count CTaHeT paBHbiM 0. TopD aejiaeT bbiboa o tom, hto naKeT 
BepHyjiCH HHHUHaTopy noncKa, oh HmeT no reqid aaHHbix 3anpoc b request table 
h HanHHaeT aHaJiH3 OTBeTa. 



CymecTByeT o^ho "ho", ecjiH He MeHflTb coaepacHMoe nyTH npn o6paTHOM 
npoxo^e naKeTa, to nocjie 6y#eT hcbo3mo3kho onpe^ejiHTb nyTb k qpaiijioBOMy 
cepBepy Ha HO/je E. Be^b 3tot nyTb BepeH TOJibKO ot hojili E no ho^m A. J\jik 
nyra ot Ho^bi A j\o HOflbi E, nyTb aojiaceH BbinumeTb TaK - 2,5,4. J\jin 3Toro npn 
npoxo^eHHH naKeTa ot ho^bi E k HO^e A KaactfbiH TopD nepe3 kotopmh 
npoxo^HT naKeT, nepea yMeHbuieHHeM count MeHaeT 3HaneHHe b nyra Ha HOMep 
coce^a ot KOToporo npnuieji B03BpamaiomHHC5i naKeT. TaKHM o6pa30M nyTb npn 
b naKeTe bbixo^m H3 HO/jbi D k Ho^e E 6y#eT TaKHM - 1,3,4. OKOHHaTeJibHbiH 
nyTb Ha HO,zje A 6yAeT - 2,5,4. 
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pHc.2 B03BpameHne naxeTa HHHimaTopy 



[fig. 2 return of packet to initiator] 



B cjie^yiomHH pa3 (npn peanraaijHH htchha) naKeT nofi^eT no nyTH 2,5,4 no 
xo#y MeH^a 3HaneHH5i b nyTH TaK hto Ha HO#e E onaTb 6y,aeT nyTb 1 ,3,7 h naKeT 
c npoHHTaHHbiM cjiancoB no 3T0My nyTH BepHeTca Ha Ho#y A. 

B cjiynae OTKjnoHeHHH hjih npepbiBaHHH CBH3H c coce^OM, cocea yaajiaeTCfl H3 
jiHCTa He cpa3y, a noMenaeTca KaK HeaKTHBHbift, h Bee naKeTbi KOTopbie 
npe^Ha3HaHaK)TCH ^jih Hero, OTGpacbmaiOTCfl, h tojibko nepe3 HeKOTopoe BpeMH 
Kor^a Bee 3anpocw KOTopbie Morjin 6bi npoxo^HTb nepe3 3to coe^HHeHHe 



3aK0HHaTca. FIotom b ciihcok Ha MecTO 3Toro coceaa mm mo»cm noMemaTb 
HOBwe coeflHHeHHa. 



TaKHM o6pa30M peanireoBaHa MapinpyrasaijHa naKeTOB. 
IIpHivieHaHHe: 

B oniicamfflx <I>opMaTOB 3anpocoB h otbctob Hcnojib3yioTCH cjieayiomHe cHMBOJibi: 

[THE FOLLOWING SYMBOLS ARE USED IN THE FORMATS OF REQUESTS 
AND RESPONSES] 

• 'H' - 3arojiOBOK (header) npoTOKOJia o6MeHa Meawy TopD [HEADER OF 
PROTOCOL OF EXCHANGES BETWEEN TOPD] 

• /'-FID 

• 'r ' - RID 
. 'b ' - BID 

• 's ' - SID 
. Y'-TID 

• xm ' - pa3Mep KycKa flaHHbix + caM KycoK namihix [SIZE OF SLICE + THE 
SLICE ITSELF 

• '4','2','r — cooTBeTCTBeHHO 4,2 h 1 6aHTHtie nana 3anpocoB, o6mhho 
Hcnojib3yK>Tca jum nepczjaHH KOJi-Ba 3JieMeHTOB b nepe^aBaeMOM ciracice. 
[CORRESPONDING, 4, 2, AND 1 BYTE FIELDS, USUALLY USED TO 
TRANSMIT THE NUMBER OF ELEMENTS IN THE LIST BEING 
TRANSMITTED] 

1 . Oiracaiffle thiiob aaHHbix Hcnojib3yeMbix b TorFS: [DESCRIPTION OF TYPES 
OF DATA USED IN TORFS] 
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FID 


128 


yHHKajIbHMH HAeHTH^HKaTOp (JmHJia : 

b 4>aHJiOBOH CHCTeMe [UNIQUE 
FILE IDENTIFIER IN THE FILE 
SYSTEM] 


BID 


32 

i 


H^eHTH^HKaTop (HOMep) 6jiOKa b 
$aftie [IDENTIFIER (NUMBER) 
OF BLOCK IN THE FILE] 



SID 


16 


HfleHTH<J)HKaTop cjiafica b Gjiokc 
[IDENTIFIER OF SLICE IN THE 
BLOCK] 


TID 


64 


HaeHTH(j)HKaTOp TpaH3aKUHH 

(nepBbie 4 6airr - BpeM« co3flaHH» 

TpaH3aKUHH) 

[IDENTIFIER OF TRANSACTION 
(FIRST 4 BYTES ARE TIME OF 
CREATION OF THE 
TRANSACTION] 


RID 


64 


BpeMeHHOH HHTepBaji [TIME 
INTERVAL] 
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computer - server of system. 
Top, Directory & File servers 
are started on it. 



computer - client of system. 
Client Server is started on it 
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